This repository is a research toolkit for evaluating hidden-state memory retrieval interventions in MNIST models, including fixed architectures and progressively growing MLPs.
- Prior staged ablation study report:
docs/final_scientific_report.md - New growing-architecture study report:
docs/growing_scientific_report.md - Latest growing study outcome: growing+memory is currently unsupported under control- and baseline-separation criteria.
- Baseline methods:
- Plain NN (
nn) - Larger NN (
nn_large) - NN + extra parametric layer (
nn_extra_layer) - Embedding-space KNN baseline (
embedding_knn)
- Plain NN (
- Hybrid memory-intervention methods:
- Active hybrid (
hybrid) - Inactive memory control (
hybrid_inactive) - Random-memory control (
random_memory)
- Active hybrid (
- Intervention controls:
- Layer placement (
early/middle/late/penultimate/ explicit layer) - Query source (
full,untouched,projection) - Value target type (
delta,absolute) - Mode (
gated,residual,overwrite) - Train-time / inference-time memory usage toggles
- Layer placement (
- Memory system:
- L2/cosine retrieval
- Top-k weighted retrieval
- Forgetting/eviction policies (
none,ttl,fifo,reservoir,usage,helpfulness,helpfulness_age) - Optional refresh updates
- Growing MLP architecture:
- Width/depth/combined growth modes
- Epoch/plateau/performance/fixed-step growth schedules
- Growth event logging and parameter-count timeline
- Diagnostics:
- Accuracy/loss/ECE
- Helped/harmed + strong-help/strong-harm fractions
- Retrieval purity/distance/top-1 agreement
- Gate statistics and intervention magnitude
- Throughput, training time, inference time, memory footprint
mnist_hybrid/config.py: experiment schema + YAML loading + CLI overridesmnist_hybrid/data.py: MNIST split/data loader utilitiesmnist_hybrid/models/: MLP/CNN intervenable modelsmnist_hybrid/memory/: KNN memory, retrieval, forgetting/refreshing, target constructionmnist_hybrid/training/trainer.py: unified training/evaluation runnermnist_hybrid/analysis/analysis.py: plotting/diagnostic utilitiesscripts/run_experiment.py: single config multi-seed runnerscripts/run_matrix.py: ablation matrix runnerscripts/analyze_results.py: generate plots from run outputsscripts/summarize_matrix.py: tabular summary exportscripts/prepare_growth_stage_b.py: Stage B promotion + config generation for growth studyscripts/build_growth_analysis.py: growth-study consolidation, pairwise stats, and report writerconfigs/: baseline/hybrid configs, ablation matrix, and growth-study matricesdocs/research_plan.md: hypotheses and experiment disciplinedocs/growth_extension_plan.md: growth-study plan, matrix, and default hyperparameters
- Original staged analysis:
results/final_analysis/consolidated_metrics.csvresults/final_analysis/comparison_tests.csvdocs/final_scientific_report.md
- Growing-architecture analysis:
results/growth_analysis/consolidated_metrics.csvresults/growth_analysis/comparison_tests.csvresults/growth_analysis/stage_policy.jsondocs/growing_scientific_report.md
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt# Plain NN
python scripts/run_experiment.py --config configs/baseline_nn.yaml
# Hybrid with memory
python scripts/run_experiment.py --config configs/base_hybrid.yamlpython scripts/run_matrix.py --base-config configs/base_hybrid.yaml --matrix configs/ablation_matrix.yaml
python scripts/summarize_matrix.py --summary results/matrix/matrix_summary.json# Stage A screening
python scripts/run_matrix.py --base-config configs/growth_base_stage_a.yaml --matrix configs/growth_matrix_stage_a.yaml
# Build Stage B promoted configs from Stage A validation only
python scripts/prepare_growth_stage_b.py
# Stage B multi-seed confirmation
python scripts/run_matrix.py --base-config results/growth_analysis/configs/growth_base_stage_b_seed11.yaml --matrix results/growth_analysis/configs/growth_matrix_stage_b_seed11.yaml
python scripts/run_matrix.py --base-config results/growth_analysis/configs/growth_base_stage_b_seed22.yaml --matrix results/growth_analysis/configs/growth_matrix_stage_b_seed22.yaml
python scripts/run_matrix.py --base-config results/growth_analysis/configs/growth_base_stage_b_seed33.yaml --matrix results/growth_analysis/configs/growth_matrix_stage_b_seed33.yaml
# Consolidate and write report
python scripts/build_growth_analysis.pypython scripts/analyze_results.py --experiment-dir results/hybrid_base_seed11- Promotion decisions in staged studies use validation metrics only.
- Every run writes:
metrics.json(config + environment + final/toggle metrics)epoch_logs.csv(epoch-level training/validation trace)details.pt(sample-level evaluation diagnostics)analysis/memory_state.ptwhen memory snapshots are enabled
- Growth-capable runs additionally log:
- growth event timeline
- parameter-count timeline
- pre/post-growth validation deltas
- Separate evidence-backed conclusions from interpretation and open questions.
- Report negative/null findings explicitly, especially for active-memory vs random/inactive controls.
- Do not claim success unless confidence intervals exclude zero for the required causal comparisons.
Legacy visualization/training scripts remain available:
train.pyvisualize_forward_pass.py