Draft
Conversation
L2: 0.5902 (Δ-0.0009 vs baseline), obj_box_col: 0.084% (Δ+0.004%) Marginal L2 gain; plan loss weight increase not a strong lever.
exp002: motion loss 0.2→0.5 badly hurt both L2+col (discard) exp004 staged: num_decoder 6→8 for richer instance features
queue_length=6: L2=0.5757 (best, Δ-0.0154 vs baseline) obj_box_col=0.100% (slightly worse than baseline) Stage exp005: queue6 + plan_loss_up combination
… revised exp004 decoder=8: obj_box_col=0.074% (best), L2=0.5955 (slightly worse) exp005 revised: queue=6 + decoder=8 combo (drop queue6+plan_loss_up)
exp005 queue6+decoder8: L2=0.5934, col=0.126% (negative synergy) Needs >10 epochs to benefit from combined capacity. Best results: - L2: 0.5757 (exp003 queue_length=6, -2.6% vs baseline) - col: 0.074% (exp004 num_decoder=8, -7.5% vs baseline)
Move research_log.md, results.tsv, docs/research_review.md → autoresearch/
…57), num_decoder=8 best col (0.074%) - autoresearch/ folder: research_log.md, results.tsv, research_review.md - scripts/dgx_run.sh: source ~/.bashrc fix for WANDB_API_KEY - 5 new experiment configs in projects/configs/auto_mar25_*
queue_length=6 as proper named config (from autoresearch mar25 exp003, best L2=0.5757)
…mar25 constraints - baseline run (exp000) always submitted first, doesn't count against max-experiments - sbatch now uses --export=ALL,WANDB_API_KEY=... to avoid WandB auth failure - hard constraints: no motion loss increase, no rotation augmentation
Run base config (nomap_queue6) as reproducible reference for mar26 session.
exp001: extended training 10→15 epochs (B2 bottleneck) exp002: plan_loss_reg 1→2, plan_loss_cls 0.5→1 on queue=6 base exp003: num_det 50→100 (more agents to planner) exp004: confidence_decay 0.6→0.8 (slower forgetting) exp005: combo of epochs15 + plan_loss_up
L2=0.5927, obj_box_col=0.104%, NDS=0.5236
epochs 10→15: L2=0.5738 (-0.019), obj_box_col=0.099% (-0.005pp) — new best both metrics
plan_loss_up: obj_box_col=0.094% (new best), L2=0.5904 (behind exp001's 0.5738)
num_det 50→100: L2=0.5676 (new best), obj_box_col=0.091% (new best) exp005 updated: epochs15 + plan_loss_up + num_det=100 (all confirmed wins)
confidence_decay 0.6→0.8: FAF +18, obj_box_col 0.120% (worse than baseline 0.104%) More false alarms from stale instances confuse the planner
exp005 (epochs15+plan_up+det100): L2=0.6109, col=0.107% — negative synergy discard Session best: exp003 num_det=100 — L2=0.5676, col=0.091% (both new bests) research_review: add mar26 findings section, update confirmed wins, negative evidence, best recipe, next experiments, open questions
Base: sparsedrive_r50_stage2_4gpu_bs24.py (WITH map, queue=4) exp001: num_det 50→100 exp002: queue_length 4→6 exp003: nomap + plan_loss_up combo exp004: epochs 10→15 exp005: epochs15 + num_det=100
L2=0.6274, obj_box_col=0.107%, NDS=0.5233, mAP_normal=0.5508
num_det=100: L2=0.6159 (-1.8%), col=0.091% (-15%), IDS 990→577 (-42%)
queue_length=6 hurts with map head active: col worsens 0.091%→0.147%, IDS stays at 995. Hard constraint: do NOT increase queue_length when map head is active.
Skip cross_gnn attention layers when map_output is None (with_map=False). The bs24 base config includes cross_gnn in operation_order; overriding with_map=False disables map output but the operation_order still references map_instance_feature_selected, causing UnboundLocalError at training start.
… config Skipping cross_gnn when map_output is None leaves map head params without gradients in DDP, causing reduction error. find_unused_parameters=True fixes.
nomap via with_map=False override on bs24 base is DDP-incompatible. Map head module stays registered with params; neither find_unused_parameters nor cross_gnn guard resolves PyTorch 1.13 mark-ready-twice error. Config updated to plan_loss_up alone (no nomap) for resubmission.
plan_loss_up alone: L2=0.6223 (beats baseline), col=0.110% (slightly worse than baseline). num_det=100 remains the strongest single lever. Moving to epochs=15.
epochs=15 hurts on bs24 with-map: L2=0.635, col=0.143% (worse than baseline). Map head gradient competition amplifies with more training. Hard constraint added. exp005 changed to num_det=100 + plan_loss_up combo.
exp005 (num_det=100 + plan_loss_up): L2=0.650, col=0.125% — negative synergy, worst in session. Best result remains exp001 (num_det=100): L2=0.616, col=0.091%. Key findings: - bs24 with-map is brittle: only num_det=100 reliably improves it - queue=6, epochs=15, plan_loss_up, and combinations all degrade performance - with_map=False override is DDP-incompatible (PyTorch 1.13) - num_det=100 confirmed universal win across 2 configs Updated research_review: new 3.13 section, updated confirmed wins/negative evidence tables, updated best recipe and next experiments.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.