openai · gmn0105 · Apr 21, 2026
diff --git a/...s/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/README.md b/...s/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/README.md
@@ -0,0 +1,36 @@
+# No-looping ablation on the SP8192 SOTA stack (5 shards, 1×H100 screening) — Non-record submission
+
+This folder captures the **“no looping”** ablation used during grant screening: run the current SP8192 SOTA training stack with **layer looping disabled**.
+
+- **Track**: non-record (screening / grant experiments)
+- **Base trainer**: `records/track_10min_16mb/2026-04-09_SP8192_3LayerRecur_ParResid_QK525_LegalTTT/train_gpt.py`
+- **Ablation**: set `NUM_LOOPS=0` (disables depth recurrence / looping)
+- **Budget**: `MAX_WALLCLOCK_SECONDS=600`
+- **Train shards**: 5
+
+## Results (3-seed)
+
+Metric notes:
+- **Pre-quantization post-EMA** isolates model quality before export.
+- **`quantized_sliding_window`** is the post-quant sliding-window BPB reported by this trainer.
+
+| Seed | Steps @ cap | Pre-quant post-EMA `val_bpb` | `quantized_sliding_window val_bpb` | Total submission size (quantized+brotli) |
+|------|-------------|------------------------------:|-----------------------------------:|-----------------------------------------:|
+| 0    | 658         | 1.327667 | 1.317445 | 16,033,831 |
+| 42   | 724         | 1.291048 | 1.280317 | 16,034,416 |
+| 1337 | 724         | 1.289564 | 1.278652 | 16,034,548 |
+| **Mean** | | **1.302760** | **1.292138** | **16,034,265** |
+
+## How to run
+
+```bash
+cd records/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100
+SEED=1337 RUN_ID=no_looping_1337 MAX_WALLCLOCK_SECONDS=600 NUM_LOOPS=0 \
+  torchrun --standalone --nproc_per_node=1 train_gpt.py
+```
+
+## Notes
+
+- This submission uses a **thin launcher** that sets the no-looping env var and then executes the base record trainer.
+- Training/eval dependencies should match the base record (FlashAttention 3, etc.).
+
diff --git a/records/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/requirements.txt b/records/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/requirements.txt
@@ -0,0 +1,8 @@
+# Match the base record's runtime deps.
+# (torch + flash-attn-3 are assumed to be installed separately)
+brotli
+huggingface-hub
+numpy
+sentencepiece
+tqdm
+
diff --git a/records/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/submission.json b/records/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/submission.json
@@ -0,0 +1,11 @@
+{
+  "track": "non_record_16mb",
+  "date": "2026-04-21",
+  "name": "No-looping ablation on SP8192 SOTA stack (5 shards, 1×H100 screening)",
+  "author": "Gautam Naik",
+  "github_id": "gautamnaik",
+  "val_bpb": 1.2921,
+  "val_loss": 3.3511,
+  "bytes_total": 16034265
+}
+
diff --git a/records/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/train_gpt.py b/records/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/train_gpt.py
@@ -0,0 +1,24 @@
+import os
+import runpy
+from pathlib import Path
+
+
+def main() -> None:
+    # Disable looping (depth recurrence) for the base SOTA stack.
+    os.environ.setdefault("NUM_LOOPS", "0")
+
+    repo_root = Path(__file__).resolve().parents[3]
+    base_trainer = (
+        repo_root
+        / "records"
+        / "track_10min_16mb"
+        / "2026-04-09_SP8192_3LayerRecur_ParResid_QK525_LegalTTT"
+        / "train_gpt.py"
+    )
+
+    runpy.run_path(str(base_trainer), run_name="__main__")
+
+
+if __name__ == "__main__":
+    main()
+
diff --git a/records/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/train_seed0.log b/records/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/train_seed0.log
@@ -0,0 +1,14 @@
+train_shards: 5
+val_tokens: 40540160
+model_params:35941464
+gptq:reserving 12s, effective=588000ms
+0/20000 val_loss: 9.0072 val_bpb: 3.4870
+500/20000 train_loss: 3.2386 train_time: 7.7m tok/s: 854791
+658/20000 val_loss: 3.1480 val_bpb: 1.2187
+stopping_early: wallclock_cap train_time: 588681ms step: 658/20000
+ema:applying EMA weights
+pre-quantization post-ema val_loss:3.42950098 val_bpb:1.32766670 eval_time:15858ms
+Total submission size quantized+brotli: 16033831 bytes
+quantized val_loss:3.44292430 val_bpb:1.33286329 eval_time:17376ms
+quantized_sliding_window val_loss:3.40309696 val_bpb:1.31744488 eval_time:554037ms
+
diff --git a/...ds/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/train_seed1337.log b/...ds/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/train_seed1337.log
@@ -0,0 +1,14 @@
+train_shards: 5
+val_tokens: 40540160
+model_params:35941464
+gptq:reserving 12s, effective=588000ms
+0/20000 val_loss: 9.0047 val_bpb: 3.4860
+500/20000 train_loss: 3.2483 train_time: 6.8m tok/s: 967384
+724/20000 val_loss: 3.1174 val_bpb: 1.2068
+stopping_early: wallclock_cap train_time: 588064ms step: 724/20000
+ema:applying EMA weights
+pre-quantization post-ema val_loss:3.33107887 val_bpb:1.28956444 eval_time:15774ms
+Total submission size quantized+brotli: 16034548 bytes
+quantized val_loss:3.34474578 val_bpb:1.29485532 eval_time:17382ms
+quantized_sliding_window val_loss:3.30289072 val_bpb:1.27865192 eval_time:542194ms
+
diff --git a/records/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/train_seed42.log b/records/track_non_record_16mb/2026-04-21_NoLooping_SOTAStack_5Shards_1xH100/train_seed42.log
@@ -0,0 +1,14 @@
+train_shards: 5
+val_tokens: 40540160
+model_params:35941464
+gptq:reserving 12s, effective=588000ms
+0/20000 val_loss: 9.0090 val_bpb: 3.4877
+500/20000 train_loss: 3.2470 train_time: 6.8m tok/s: 966857
+724/20000 val_loss: 3.1179 val_bpb: 1.2070
+stopping_early: wallclock_cap train_time: 588403ms step: 724/20000
+ema:applying EMA weights
+pre-quantization post-ema val_loss:3.33491082 val_bpb:1.29104791 eval_time:15899ms
+Total submission size quantized+brotli: 16034416 bytes
+quantized val_loss:3.34793848 val_bpb:1.29609132 eval_time:17726ms
+quantized_sliding_window val_loss:3.30719303 val_bpb:1.28031748 eval_time:544877ms
+