- uv: running scripts; managing virtual environments
- ffmpeg: optional video generation during evaluation
For development, install with uv:
uv sync --extra devIf contributing to the codebase, you can run
pre-commit install This will setup the pre-commit checks so any pushed commits will pass the CI.
For detailed documentation on the available scripts and configuration system, see docs/SCRIPTS_AND_CONFIGS.md.
Train an encoder-decoder stack and evaluate the resulting checkpoint:
# Train
uv run train_autoencoderTrain an encoder-processor-decoder stack and evaluate the resulting checkpoint:
# Train
uv run train_encoder_processor_decoder \
hydra.run.dir=outputs/encoder_processor_decoder_run
# Evaluate
uv run evaluate_encoder_processor_decoder \
hydra.run.dir=outputs/processor_eval \
eval.checkpoint=outputs/encoder_processor_decoder_run/encoder_processor_decoder.ckpt \
eval.batch_indices=[0,1] \
eval.video_dir=outputs/encoder_processor_decoder_run/videosEvaluation writes a CSV of aggregate metrics to eval.csv_path (defaults to
<work-dir>/evaluation_metrics.csv) and, when eval.batch_indices is provided,
stores rollout animations for the specified test batches.
This assumes you have the reaction_diffusion dataset stored at the path specified by
the AUTOCAST_DATASETS environment variable.
uv run train_autoencoder \
hydra.run.dir=outputs/rd/00 \
datamodule.data_path=$AUTOCAST_DATASETS/reaction_diffusion \
datamodule.use_simulator=false \
optimizer.learning_rate=0.00005 \
trainer.max_epochs=10 \
logging.wandb.enabled=trueOr alternatively with the included bash script:
./scripts/ae.sh rd 00 reaction_diffusionuv run train_encoder_processor_decoder \
hydra.run.dir=outputs/rd/00 \
datamodule.data_path=$AUTOCAST_DATASETS/reaction_diffusion \
datamodule.use_simulator=false \
optimizer.learning_rate=0.0001 \
trainer.max_epochs=10 \
logging.wandb.enabled=true \
'autoencoder_checkpoint=outputs/rd/00/autoencoder.ckpt'Or alternatively with the included bash script:
./scripts/epd.sh rd 00 reaction_diffusionuv run evaluate_encoder_processor_decoder \
hydra.run.dir=outputs/rd/00/eval \
eval.checkpoint=outputs/rd/00/encoder_processor_decoder.ckpt \
eval.batch_indices=[0,1,2,3] \
eval.video_dir=outputs/rd/00/eval/videos \
datamodule.data_path=$AUTOCAST_DATASETS/reaction_diffusion \
datamodule.use_simulator=falseOr alternatively with the included bash script:
./scripts/eval.sh rd 00 reaction_diffusionAutoCast now ships with an optional Weights & Biases integration that is
fully driven by the Hydra config under configs/logging/wandb.yaml.
-
Enable logging for CLI workflows by passing Hydra config overrides as positional arguments:
uv run train_encoder_processor_decoder \ logging.wandb.enabled=true \ logging.wandb.project=autocast-experiments \ logging.wandb.name=processor-baseline
-
The autoencoder/processor training CLIs pass the configured
WandbLoggerdirectly into Lightning so that metrics, checkpoints, and artifacts are synchronized automatically. -
The evaluation CLI reports aggregate test metrics to the same run when logging is enabled, making it easy to compare training and evaluation outputs in one dashboard.
-
All notebooks contain a dedicated cell that instantiates a
wandb_loggerviaautocast.logging.create_wandb_logger. Toggle theenabledflag in that cell to control tracking when experimenting interactively.
When enabled remains false (the default), the logger is skipped entirely, so the stack can
be used without a W&B account.
In the slurm_templates folders, template slurm scripts can be found for the following use cases:
- train_and_eval_autoencoder.sh : Training and evaluation of the autoencoder
- train_and_eval_encoder-processor-decoder.sh : Training and evaluation of the encoder-processor-decoder approach
- encoder-processor-decoder-parameter_sweep : Same as above but runs a parameter sweep
We advise you copy these scripts into a folder called slurm_scripts (which is in the gitignore) and edit as you see fit.
To run, simply navigate to the top level of this repository, and run:
sbatch scripts/train_and_eval_encoder-processor-decoder.sh or
sbatch scripts/train_and_eval_autoencoder.sh depending on which model you would like to run.
This will train and evaluate the model using the settings in the corresponding config (found in the configs folder). Outputs from both train and eval will be written out to an outputs folder with the following naming convention:
outputs/{job_name}/{$date +%Y%m%d_%H%M%S}.
scripts/encoder-processor-decoder-parameter_sweep.sh is an example parameter sweep.
It uses slurm arrays and hydra override functionality to sweep through combinations of parameters. The resulting output structure looks like this:
- outputs
- {job_name}
- job-{job_id} # Unique for each sweep run
- parameter_lookup.csv # csv file mapping task id to parameter values.
- task-0 # 0 is the slurm array task id. It is unique for each set of parameters
- task-1
- etc.
- job-{job_id} # Unique for each sweep run
- {job_name}
A checklist of things to change in the example script:
--array=0-8: This is the number of parallel jobs to run. This should be equal to the number of parameter combinations you want to run.JOB_NAME="encoder_processor_decoder_sweep": Name of the Job. This is the top level directory.- The whole Define Parameter Grid section.
- The columns to be writte to the parameter csv file
- The flags in the python script to overright the hydra config. e.g.
trainer.max_epochs=${MAX_EPOCH}
Thanks goes to these wonderful people (emoji key):
Jason McEwen 🤔 📆 |
Radka Jersakova 🤔 📆 💻 👀 |
Paolo Conti 🤔 💻 👀 |
Marjan Famili 🤔 💻 👀 |
Christopher Iliffe Sprague 🤔 💻 👀 |
Edwin 🤔 💻 👀 |
Sam Greenbury 🤔 📆 💻 👀 |
This project follows the all-contributors specification. Contributions of any kind welcome!