Surface RIMAPI action errors to agents + fix load-settle race#19
Open
jkbennitt wants to merge 1 commit into
Open
Surface RIMAPI action errors to agents + fix load-settle race#19jkbennitt wants to merge 1 commit into
jkbennitt wants to merge 1 commit into
Conversation
Two fixes from live-game testing that caused agents to fail and re-fail
the same invalid action every tick:
1) Load-settle race in run_scenario.py: the polling loop broke on first
`population > 0`, which was ~4s after load_game. RIMAPI returns 200
before Unity's main thread finishes applying the load, so the
immediately-following unforbid_all_items() POST got 500'd. Now require
5 consecutive stable-population checks (10s floor) before writes.
2) Surface per-action errors back to agents via CentralPost:
- ExecutionResult.outcomes now carries an ActionOutcome per dispatched
action with success + a cleaned-up error message (unwrapped from
RIMAPI's {"errors":[...]} JSON envelope).
- game_loop broadcasts STATUS_UPDATE with failed outcomes so agents
see "DO NOT REPEAT" context next tick. Removes the "researcher
keeps re-proposing already-finished Electricity" loop observed in
the live run.
- Also fixed pre-existing position-based telemetry bug
(`i < executed` assumed actions failed in order).
4 new unit tests cover the outcome capture paths.
Tests: 382 pass. ruff/mypy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two fixes from live-game testing (2026-05-16 smoke run on Nemotron 120B-A12B, Crashlanded) where agents were re-proposing the same invalid action every tick because RIMAPI's 500-on-semantic-error was opaque to the agent loop.
run_scenario.py: polling broke on firstpopulation > 0(~4s afterload_game), but RIMAPI returns 200 before Unity finishes applying the load. The immediately-followingunforbid_all_items()POST then got 500'd. Now requires 5 consecutive stable-population checks (10s floor) before writes.ExecutionResult.outcomesnow carries anActionOutcomeper dispatched action withsuccess+ cleaned-up error message (unwrapped from RIMAPI's{"errors":[...]}JSON envelope).game_loopbroadcastsSTATUS_UPDATEwith failed outcomes via CentralPost, so agents see "DO NOT REPEAT" context next tick. Removes the "researcher keeps re-proposing already-finished Electricity" loop observed live.i < executedhad assumed actions failed in order; now uses per-action outcome flags.Diff: 4 files, +167 / -13. 382 tests pass; ruff + mypy strict clean.
Why this matters for the benchmark
Agents currently can't learn from their own mistakes within a run — they see opaque server errors, retry the same invalid combo, and burn deliberation budget on no-ops. The
ActionOutcome+ CentralPost broadcast is observability and an agent learning loop in one piece. It's the first piece of the observability floor in the broader Phase A restructuring.Test plan
pytest— 382 pass (4 new outcome-capture tests)ruff check src/ tests/ scripts/— cleanmypy src/— strict clean🤖 Generated with Claude Code