This guide helps you set up and use this template at your preferred level of complexity. Start with Tier 1 (minimal) and add features as needed.
Warning
NEVER COMMIT DATA FILES TO GITHUB.
NEVER USE AI ASSISTANTS WITH PERSONALLY IDENTIFIABLE DATA.
YOU ARE REQUIRED TO REMOVE IDENTIFYING INFORMATION BEFORE CONNECTING AI ASSISTANTS OR STORING IN ANY UNENCRYPTED LOCATION.
| Tier | Time Investment | Best For | Key Benefit |
|---|---|---|---|
| 1 | 15 min | Getting started, small projects | Version control + reproducibility |
| 2 | +5 min | Regular use | Simple commands (no path typing) |
| 3 | +10 min | Projects with long runtimes (>5min) | Only rebuild what changed |
| 4 | +15 min | Full-time development | IDE integration + quality checks |
ipa-stata-template/
├── data/
│ ├── raw/ # Your raw data (never edit!)
│ └── clean/ # Cleaned data (generated)
├── do_files/ # Your Stata scripts
├── outputs/
│ ├── tables/ # Generated tables
│ └── figures/ # Generated figures
├── logs/ # Execution logs
└── ado/ # Local Stata packages
What you get: Reproducible analysis with version control
-
Install Stata 17+: IPA Box link
-
Install Git
Windows:
winget install --id Git.Git -e
macOS:
brew install git
Linux or manual install: Download from git-scm.com
-
Install Just
Windows:
winget install --id Casey.Just -e
macOS/Linux:
brew install just
Manual install: Download from GitHub releases
Note: After installing Git or Just, you may need to restart your terminal for the commands to be recognized.
-
(Recommended) Install VS Code
Windows:
winget install --id Microsoft.VisualStudioCode -e
macOS/Linux: Download from code.visualstudio.com
-
(Recommended) Install VS Code extensions
- Jupyter Extension for VS Code
- vscode-stata for running Stata code
- GitHub Copilot Chat or Claude Code for AI assistance
git clone <your-repo-url>
cd ipa-stata-templateCopy the example environment file:
Windows:
copy .env-example .envmacOS/Linux:
cp .env-example .envOpen .env in a text editor and set your Stata executable path:
Windows example:
STATA_CMD='C:\Program Files\Stata18\StataSE-64.exe'
STATA_EDITION='se'macOS example:
STATA_CMD='/Applications/Stata/StataSE.app/Contents/MacOS/StataSE'
STATA_EDITION='se'Linux example:
STATA_CMD='/usr/local/stata18/stata-se'
STATA_EDITION='se'Tip: If your Stata path contains spaces, keep the single quotes around the path.
If your data is stored separately from your code (e.g., on a secure network drive):
Windows:
copy config.do.template config.domacOS/Linux:
cp config.do.template config.doThen edit config.do to set your data location:
// Example: Network drive
global data_root "X:/SECURE_AREA_12345_project_name_country/data"
// Example: Dropbox
global data_root "D:/Dropbox/ProjectName/data"
// Example: Local documents
global data_root "C:/Users/YourName/Documents/Research/ProjectName/data"Note: config.do is gitignored and never committed to version control. If you
don't create it, the template defaults to using data/ in the project root.
Run this command to set up the Python-Stata bridge:
just stata-setupWhat this does:
- Creates a Python virtual environment in
.venv/(takes ~2-3 minutes) - Installs pystatacons (enables Python to communicate with Stata)
- Installs required Stata packages to
ado/
Optional: Verify the setup was successful:
just stata-check-installationBatch mode runs Stata from the command line and automatically creates log files:
just stata-do demo/stata-demoWhat is batch mode? Running Stata from the command line instead of the GUI. This creates automatic log files that are useful for debugging and working with AI assistants.
Open Stata GUI and run from the project root (ipa-stata-template/):
do do_files/demo/stata-demo.doCheck for these signs of success:
- No error messages in the console
- New files created in
outputs/tables/ - New files created in
outputs/figures/ - A log file in
logs/ending with "end of do-file"
Check outputs:
- Tables:
outputs/tables/ - Figures:
outputs/figures/ - Logs:
logs/
Problem: just: command not found after installation
- Solution: Restart your terminal or command prompt
Problem: Stata executable not found
- Solution: Check that the path in
.envexactly matches your Stata installation location. Use forward slashes (/) even on Windows.
Problem: Error about spaces in path
- Solution: Make sure paths with spaces are wrapped in single quotes in
.env
Problem: Python or virtual environment errors
- Solution: Make sure you ran
just stata-setupfrom the project root directory
The master do-file orchestrates your entire pipeline using control switches:
// Set to 0 to skip during development
local data_cleaning = 1
local data_preparation = 1
local descriptive_analysis = 1
local main_analysis = 1
local robustness_checks = 1
local generate_figures = 1This allows you to quickly iterate on specific parts without re-running everything.
Running Stata in batch mode (stata -e or via just commands) is recommended because:
- Creates log files that AI assistants can read
- Captures all output for debugging
- More reproducible than interactive execution
Already completed Tier 1? Add convenient shortcut commands with these steps.
What you get: Simple commands instead of typing full paths
Note: Tier 1 already includes Just installation, so you can skip directly to using the commands.
just stata-run # Run the full pipeline (00_run.do)
just stata-config # Show Stata configuration
just help # See all available commands# Run individual scripts
just stata-script 01_data_cleaning
# Check your Stata setup
just stata-check-installation
# View system information
just system-infoThat's it! No additional installation needed.
Already using Just commands? Add smart rebuilding for large projects.
What you get: Incremental builds - only rebuild what changed
Incremental builds: Only re-run scripts whose inputs have changed, saving time on large projects.
Use dependency tracking if your full pipeline takes more than 5 minutes and you're frequently making changes to individual do-files. For most projects, Tier 1 or 2 is sufficient.
Install uv (Python package manager):
Windows:
winget install --id astral-sh.uv -emacOS/Linux:
brew install uvManual install: See https://docs.astral.sh/uv/
Then sync the Python environment:
uv syncjust stata-build # Build with dependency tracking
just stata-data # Build only data pipeline
just stata-analysis # Build only analysis
just stata-clean # Clean all outputsscons reads the SConstruct file which defines dependencies:
# When 01_data_cleaning.do changes, rebuild cleaned_data.dta
data_clean = env.StataBuild(
target='data/clean/cleaned_data.dta',
source='do_files/01_data_cleaning.do'
)If you modify 01_data_cleaning.do, scons knows to re-run downstream scripts
but not unrelated ones.
Using dependency tracking? Add IDE integration and automated quality checks.
What you get: Interactive Stata in VS Code, automatic linting, pre-commit hooks
Run the automated setup command:
just get-startedThis installs everything: uv, git, quarto, markdownlint, nbstata, Stata packages.
Run Stata interactively in VS Code, similar to Ctrl+D workflow:
- Install the vscode-stata extension
- Test with files in
do_files/demo/ - Select the nbstata kernel at
.venv/Scripts/python.exe(Windows) or.venv/bin/python(macOS/Linux)
just lint-stata # Check Stata code quality
just lint-py # Check Python code
just fmt-markdown # Format markdown filesjust render-report # Generate analysis report
just preview-report # Preview in browserPlace raw data in data/raw/ and update the do-files to reference your files.
Important: Do not commit data files (especially large or sensitive ones) to GitHub.
- Copy
config.do.templatetoconfig.do - Set
global data_rootto your data location - Place raw data in
<your-data-path>/raw/
The template automatically uses your configured path while keeping your code repository
clean and portable. The config.do file is gitignored to protect sensitive path information.
- 01_data_cleaning.do: Modify cleaning steps for your data
- 02_data_preparation.do: Define your analysis sample
- 03_descriptive_analysis.do: Customize summary statistics
- 04_main_analysis.do: Add your regression specifications
- 05_robustness_checks.do: Define alternative specifications
- 06_generate_figures.do: Create visualizations
net install github, from("https://haghish.github.io/github/")
github install PovertyAction/ipaplotsThe template automatically uses IPA branding when ipaplots is available.
- Never modify files in
data/raw/(treat as read-only) - Use global macros for file paths
- Use version control for code, not data files
- Keep do-files focused on single tasks
- Use descriptive variable names
- Comment extensively
- Include quality checks and validation
Before increasing maxvar, consider:
- Load only needed columns:
use var1 var2 using "data.dta" - Reshape to long format: Wide loops are slow; long operations are fast
- Modularize: Clean one survey module at a time
- Ensure you're running from the project root directory
- Check file paths in
.envmatch your Stata installation
- Restart your terminal after installation
- Ensure you ran
uv syncto create the Python environment (for scons) - Activate the environment manually if needed:
- Windows:
.venv/Scripts/activate - macOS/Linux:
source .venv/bin/activate
- Windows:
- Use forward slashes in file paths (e.g.,
C:/Program Files/Stata18/...) - Quote paths with spaces in
.envfile
- Delete the
.venv/folder and runjust stata-setupagain - Make sure you're running commands from the project root directory
- Check log files in
logs/for Stata errors - Review the statacons documentation
- See the README for additional resources
Batch mode: Running Stata from the command line instead of the GUI, which creates automatic log files.
Dependency tracking: A system that tracks which files depend on other files, so only necessary scripts are re-run when changes are made.
Incremental builds: Only rebuilding outputs that have changed or depend on changed inputs, rather than rebuilding everything from scratch.
Virtual environment: An isolated Python environment that keeps project dependencies separate from system-wide Python packages.
Task runner: A tool (like just) that provides shortcuts for commonly-used command sequences.