Getting Started with the Stata Project Template

This guide helps you set up and use this template at your preferred level of complexity. Start with Tier 1 (minimal) and add features as needed.

Warning

NEVER COMMIT DATA FILES TO GITHUB.

NEVER USE AI ASSISTANTS WITH PERSONALLY IDENTIFIABLE DATA.

YOU ARE REQUIRED TO REMOVE IDENTIFYING INFORMATION BEFORE CONNECTING AI ASSISTANTS OR STORING IN ANY UNENCRYPTED LOCATION.

Which Tier Should You Use?

Tier	Time Investment	Best For	Key Benefit
1	15 min	Getting started, small projects	Version control + reproducibility
2	+5 min	Regular use	Simple commands (no path typing)
3	+10 min	Projects with long runtimes (>5min)	Only rebuild what changed
4	+15 min	Full-time development	IDE integration + quality checks

Project File Structure

ipa-stata-template/
├── data/
│   ├── raw/          # Your raw data (never edit!)
│   └── clean/        # Cleaned data (generated)
├── do_files/         # Your Stata scripts
├── outputs/
│   ├── tables/       # Generated tables
│   └── figures/      # Generated figures
├── logs/             # Execution logs
└── ado/              # Local Stata packages

Tier 1: Minimal Setup (Git + Stata + Just)

What you get: Reproducible analysis with version control

Installation Checklist (Do these in order)

Install Stata 17+: IPA Box link
Install Git

Windows:
```
winget install --id Git.Git -e
```
macOS:
```
brew install git
```
Linux or manual install: Download from git-scm.com
Install Just

Windows:
```
winget install --id Casey.Just -e
```
macOS/Linux:
```
brew install just
```
Manual install: Download from GitHub releases

Note: After installing Git or Just, you may need to restart your terminal for the commands to be recognized.
(Recommended) Install VS Code

Windows:
```
winget install --id Microsoft.VisualStudioCode -e
```
macOS/Linux: Download from code.visualstudio.com
(Recommended) Install VS Code extensions
- Jupyter Extension for VS Code
- vscode-stata for running Stata code
- GitHub Copilot Chat or Claude Code for AI assistance

Setup Steps

1. Clone the repository

git clone <your-repo-url>
cd ipa-stata-template

2. Configure your Stata path

Copy the example environment file:

Windows:

copy .env-example .env

macOS/Linux:

cp .env-example .env

Open .env in a text editor and set your Stata executable path:

Windows example:

STATA_CMD='C:\Program Files\Stata18\StataSE-64.exe'
STATA_EDITION='se'

macOS example:

STATA_CMD='/Applications/Stata/StataSE.app/Contents/MacOS/StataSE'
STATA_EDITION='se'

Linux example:

STATA_CMD='/usr/local/stata18/stata-se'
STATA_EDITION='se'

Tip: If your Stata path contains spaces, keep the single quotes around the path.

3. Configure your data path (Optional)

If your data is stored separately from your code (e.g., on a secure network drive):

Windows:

copy config.do.template config.do

macOS/Linux:

cp config.do.template config.do

Then edit config.do to set your data location:

// Example: Network drive
global data_root "X:/SECURE_AREA_12345_project_name_country/data"

// Example: Dropbox
global data_root "D:/Dropbox/ProjectName/data"

// Example: Local documents
global data_root "C:/Users/YourName/Documents/Research/ProjectName/data"

Note: config.do is gitignored and never committed to version control. If you don't create it, the template defaults to using data/ in the project root.

4. Set up coding environment

Run this command to set up the Python-Stata bridge:

just stata-setup

What this does:

Creates a Python virtual environment in .venv/ (takes ~2-3 minutes)
Installs pystatacons (enables Python to communicate with Stata)
Installs required Stata packages to ado/

Optional: Verify the setup was successful:

just stata-check-installation

5. Run the demo pipeline

Option A: Batch mode (recommended)

Batch mode runs Stata from the command line and automatically creates log files:

just stata-do demo/stata-demo

What is batch mode? Running Stata from the command line instead of the GUI. This creates automatic log files that are useful for debugging and working with AI assistants.

Option B: Interactive mode

Open Stata GUI and run from the project root (ipa-stata-template/):

do do_files/demo/stata-demo.do

6. Verify success

Check for these signs of success:

No error messages in the console
New files created in outputs/tables/
New files created in outputs/figures/
A log file in logs/ ending with "end of do-file"

Check outputs:

Tables: outputs/tables/
Figures: outputs/figures/
Logs: logs/

Common Setup Issues

Problem: just: command not found after installation

Solution: Restart your terminal or command prompt

Problem: Stata executable not found

Solution: Check that the path in .env exactly matches your Stata installation location. Use forward slashes (/) even on Windows.

Problem: Error about spaces in path

Solution: Make sure paths with spaces are wrapped in single quotes in .env

Problem: Python or virtual environment errors

Solution: Make sure you ran just stata-setup from the project root directory

Understanding the Full Template Pipeline

Understanding `00_run.do`

The master do-file orchestrates your entire pipeline using control switches:

// Set to 0 to skip during development
local data_cleaning         = 1
local data_preparation      = 1
local descriptive_analysis  = 1
local main_analysis         = 1
local robustness_checks     = 1
local generate_figures      = 1

This allows you to quickly iterate on specific parts without re-running everything.

Why Use Batch Mode?

Running Stata in batch mode (stata -e or via just commands) is recommended because:

Creates log files that AI assistants can read
Captures all output for debugging
More reproducible than interactive execution

Tier 2: Add Task Runner

Already completed Tier 1? Add convenient shortcut commands with these steps.

What you get: Simple commands instead of typing full paths

Note: Tier 1 already includes Just installation, so you can skip directly to using the commands.

Available Commands

just stata-run      # Run the full pipeline (00_run.do)
just stata-config   # Show Stata configuration
just help           # See all available commands

Common Commands

# Run individual scripts
just stata-script 01_data_cleaning

# Check your Stata setup
just stata-check-installation

# View system information
just system-info

That's it! No additional installation needed.

Tier 3: Add Dependency Tracking

Already using Just commands? Add smart rebuilding for large projects.

What you get: Incremental builds - only rebuild what changed

Incremental builds: Only re-run scripts whose inputs have changed, saving time on large projects.

When to Use This Tier

Use dependency tracking if your full pipeline takes more than 5 minutes and you're frequently making changes to individual do-files. For most projects, Tier 1 or 2 is sufficient.

Setup

Install uv (Python package manager):

Windows:

winget install --id astral-sh.uv -e

macOS/Linux:

brew install uv

Manual install: See https://docs.astral.sh/uv/

Then sync the Python environment:

uv sync

Use It

just stata-build    # Build with dependency tracking
just stata-data     # Build only data pipeline
just stata-analysis # Build only analysis
just stata-clean    # Clean all outputs

How It Works

scons reads the SConstruct file which defines dependencies:

# When 01_data_cleaning.do changes, rebuild cleaned_data.dta
data_clean = env.StataBuild(
    target='data/clean/cleaned_data.dta',
    source='do_files/01_data_cleaning.do'
)

If you modify 01_data_cleaning.do, scons knows to re-run downstream scripts but not unrelated ones.

Tier 4: Full Development Environment

Using dependency tracking? Add IDE integration and automated quality checks.

What you get: Interactive Stata in VS Code, automatic linting, pre-commit hooks

Setup

Run the automated setup command:

just get-started

This installs everything: uv, git, quarto, markdownlint, nbstata, Stata packages.

Features

VS Code Integration (nbstata)

Run Stata interactively in VS Code, similar to Ctrl+D workflow:

Install the vscode-stata extension
Test with files in do_files/demo/
Select the nbstata kernel at .venv/Scripts/python.exe (Windows) or .venv/bin/python (macOS/Linux)

Code Quality

just lint-stata    # Check Stata code quality
just lint-py       # Check Python code
just fmt-markdown  # Format markdown files

Report Generation

just render-report  # Generate analysis report
just preview-report # Preview in browser

Customizing for Your Project

Add Your Data

Option 1: Data in project directory (default)

Place raw data in data/raw/ and update the do-files to reference your files.

Important: Do not commit data files (especially large or sensitive ones) to GitHub.

Option 2: Data stored separately (recommended for secure/network drives)

Copy config.do.template to config.do
Set global data_root to your data location
Place raw data in <your-data-path>/raw/

The template automatically uses your configured path while keeping your code repository clean and portable. The config.do file is gitignored to protect sensitive path information.

Update Analysis Scripts

01_data_cleaning.do: Modify cleaning steps for your data
02_data_preparation.do: Define your analysis sample
03_descriptive_analysis.do: Customize summary statistics
04_main_analysis.do: Add your regression specifications
05_robustness_checks.do: Define alternative specifications
06_generate_figures.do: Create visualizations

IPA Visualizations (for IPA Staff)

net install github, from("https://haghish.github.io/github/")
github install PovertyAction/ipaplots

The template automatically uses IPA branding when ipaplots is available.

Best Practices

Data Management

Never modify files in data/raw/ (treat as read-only)
Use global macros for file paths
Use version control for code, not data files

Code Organization

Keep do-files focused on single tasks
Use descriptive variable names
Comment extensively
Include quality checks and validation

Performance Tips

Before increasing maxvar, consider:

Load only needed columns: use var1 var2 using "data.dta"
Reshape to long format: Wide loops are slow; long operations are fast
Modularize: Clean one survey module at a time

Troubleshooting

Stata cannot find do-files

Ensure you're running from the project root directory
Check file paths in .env match your Stata installation

"Command just not found" or "Command scons not found"

Restart your terminal after installation
Ensure you ran uv sync to create the Python environment (for scons)
Activate the environment manually if needed:
- Windows: .venv/Scripts/activate
- macOS/Linux: source .venv/bin/activate

Path issues on Windows

Use forward slashes in file paths (e.g., C:/Program Files/Stata18/...)
Quote paths with spaces in .env file

Python virtual environment errors

Delete the .venv/ folder and run just stata-setup again
Make sure you're running commands from the project root directory

Getting Help

Check log files in logs/ for Stata errors
Review the statacons documentation
See the README for additional resources

Glossary

Batch mode: Running Stata from the command line instead of the GUI, which creates automatic log files.

Dependency tracking: A system that tracks which files depend on other files, so only necessary scripts are re-run when changes are made.

Incremental builds: Only rebuilding outputs that have changed or depend on changed inputs, rather than rebuilding everything from scratch.

Virtual environment: An isolated Python environment that keeps project dependencies separate from system-wide Python packages.

Task runner: A tool (like just) that provides shortcuts for commonly-used command sequences.

FilesExpand file tree

getting-started.md

Latest commit

History

getting-started.md

File metadata and controls

Getting Started with the Stata Project Template

Which Tier Should You Use?

Project File Structure

Tier 1: Minimal Setup (Git + Stata + Just)

Installation Checklist (Do these in order)

Setup Steps

1. Clone the repository

2. Configure your Stata path

3. Configure your data path (Optional)

4. Set up coding environment

5. Run the demo pipeline

Option A: Batch mode (recommended)

Option B: Interactive mode

6. Verify success

Common Setup Issues

Understanding the Full Template Pipeline

Understanding 00_run.do

Why Use Batch Mode?

Tier 2: Add Task Runner

Available Commands

Common Commands

Tier 3: Add Dependency Tracking

When to Use This Tier

Setup

Use It

How It Works

Tier 4: Full Development Environment

Setup

Features

VS Code Integration (nbstata)

Code Quality

Report Generation

Customizing for Your Project

Add Your Data

Option 1: Data in project directory (default)

Option 2: Data stored separately (recommended for secure/network drives)

Update Analysis Scripts

IPA Visualizations (for IPA Staff)

Best Practices

Data Management

Code Organization

Performance Tips

Troubleshooting

Stata cannot find do-files

"Command just not found" or "Command scons not found"

Path issues on Windows

Python virtual environment errors

Getting Help

Glossary

Understanding `00_run.do`