Surface RIMAPI action errors to agents + fix load-settle race by jkbennitt · Pull Request #19 · AppSprout-dev/RLE

jkbennitt · 2026-05-16T06:36:26Z

Summary

Two fixes from live-game testing (2026-05-16 smoke run on Nemotron 120B-A12B, Crashlanded) where agents were re-proposing the same invalid action every tick because RIMAPI's 500-on-semantic-error was opaque to the agent loop.

Load-settle race in run_scenario.py: polling broke on first population > 0 (~4s after load_game), but RIMAPI returns 200 before Unity finishes applying the load. The immediately-following unforbid_all_items() POST then got 500'd. Now requires 5 consecutive stable-population checks (10s floor) before writes.
Action error feedback loop: ExecutionResult.outcomes now carries an ActionOutcome per dispatched action with success + cleaned-up error message (unwrapped from RIMAPI's {"errors":[...]} JSON envelope). game_loop broadcasts STATUS_UPDATE with failed outcomes via CentralPost, so agents see "DO NOT REPEAT" context next tick. Removes the "researcher keeps re-proposing already-finished Electricity" loop observed live.
Pre-existing positional telemetry bug fixed: i < executed had assumed actions failed in order; now uses per-action outcome flags.

Diff: 4 files, +167 / -13. 382 tests pass; ruff + mypy strict clean.

Why this matters for the benchmark

Agents currently can't learn from their own mistakes within a run — they see opaque server errors, retry the same invalid combo, and burn deliberation budget on no-ops. The ActionOutcome + CentralPost broadcast is observability and an agent learning loop in one piece. It's the first piece of the observability floor in the broader Phase A restructuring.

Test plan

pytest — 382 pass (4 new outcome-capture tests)
ruff check src/ tests/ scripts/ — clean
mypy src/ — strict clean
Re-run the same live smoke config (Nemotron 120B-A12B, Crashlanded, 10 ticks) on this branch to verify mood (0.408) and research (0.226) move under the new feedback loop — tracked as A10 in the project plan.

🤖 Generated with Claude Code

Two fixes from live-game testing that caused agents to fail and re-fail the same invalid action every tick: 1) Load-settle race in run_scenario.py: the polling loop broke on first `population > 0`, which was ~4s after load_game. RIMAPI returns 200 before Unity's main thread finishes applying the load, so the immediately-following unforbid_all_items() POST got 500'd. Now require 5 consecutive stable-population checks (10s floor) before writes. 2) Surface per-action errors back to agents via CentralPost: - ExecutionResult.outcomes now carries an ActionOutcome per dispatched action with success + a cleaned-up error message (unwrapped from RIMAPI's {"errors":[...]} JSON envelope). - game_loop broadcasts STATUS_UPDATE with failed outcomes so agents see "DO NOT REPEAT" context next tick. Removes the "researcher keeps re-proposing already-finished Electricity" loop observed in the live run. - Also fixed pre-existing position-based telemetry bug (`i < executed` assumed actions failed in order). 4 new unit tests cover the outcome capture paths. Tests: 382 pass. ruff/mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface RIMAPI action errors to agents + fix load-settle race#19

Surface RIMAPI action errors to agents + fix load-settle race#19
jkbennitt wants to merge 1 commit into
masterfrom
fix/post-live-test-findings

jkbennitt commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkbennitt commented May 16, 2026

Summary

Why this matters for the benchmark

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant