Skip to content

Add Figshare download script and standardize fixtures to 50 rows#60

Merged
doncamilom merged 3 commits intomainfrom
feature/fixtures-and-download
Apr 6, 2026
Merged

Add Figshare download script and standardize fixtures to 50 rows#60
doncamilom merged 3 commits intomainfrom
feature/fixtures-and-download

Conversation

@doncamilom
Copy link
Copy Markdown
Contributor

Summary

  • scripts/setup_data.py: end-to-end Figshare download, extraction, and MIST_DATA_DIR setup
  • Standardized all reaction task fixtures to 50 rows (rxn_inversion 12->50, rxn_naming 20->50, rxn_truefalse 12->50)
  • demo/expand_fixtures.py: utility for expanding fixtures by cycling rows
  • Updated fixture_manifest.csv with correct counts

scripts/setup_data.py

Downloads datasets.zip (2.2 GB) and optionally models.zip (5 GB) from Figshare, verifies MD5, extracts, and maps to the ${MIST_DATA_DIR} layout expected by recipes. Writes .env.local for convenience.

# Datasets only (skip 5GB model download)
python scripts/setup_data.py --data-dir ./data --skip-models

# Full setup (datasets + models)
python scripts/setup_data.py --data-dir ./data

# Then either:
source .env.local && export MIST_DATA_DIR
# Or:
export MIST_DATA_DIR=./data

Test plan

  • Verify python demo/run_fixture_smoke.py passes with all fixtures
  • Verify scripts/setup_data.py --skip-models --skip-verify runs (dry-run on CI without downloading 2GB)

@doncamilom doncamilom added priority: high Important for paper claims infrastructure Build, CI, release, or tooling labels Apr 6, 2026
@doncamilom doncamilom merged commit bdcf641 into main Apr 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Build, CI, release, or tooling priority: high Important for paper claims

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant