You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace the three surface-specific Responses agent loops (OpenAI non-streaming + streaming, gRPC regular, gRPC harmony) with one shared agent loop used by every surface. Every Responses request enters the same loop, which decides the next action from state:
CallLlm
ExecuteTools (gateway-owned tools: MCP, builtins)
InterruptForApproval
Finish
A request with no MCP tools still enters the loop; it simply never produces an ExecuteTools.
Problem
Agentic control flow in the Responses routers is currently spread across layers:
Loop entry is decided outside the loop. Each surface asks "does this request carry MCP tools?" and chooses between a one-shot path and a tool-loop path. Two paths exist per surface, so history loading, request rebuilding, final response assembly, persistence, and streaming completion are duplicated 3×.
Surface logic and loop logic are intermixed. "What happens next?" is spread across route handlers, history loaders, streaming interceptors, and MCP execution helpers. Adding a behavior (approval continuation, max_tool_calls accounting, visibility rules) creates another loop-external patch point instead of extending one explicit state machine.
Three representations interleave without explicit boundaries. The client ResponsesRequest/ResponsesResponse, the upstream model payload, and the stored response chain used by previous_response_id all flow through the same router code without being clearly separated. This already causes observable parity drift vs OpenAI — e.g. before refactor(responses): split load/prepare for canonical agent-loop input #1315 landed, previous_response_id replay silently dropped mcp_list_tools/mcp_call items and re-executed the tool.
Goals
Every Responses surface enters the same agent loop.
The loop decides the next action from state; router code contains no loop-entry branching.
Streaming and non-streaming share the driver; surface adapters own only parser-local logic and wire-level event translation.
max_tool_calls, approval interrupts, mcp_list_tools dedupe, and hidden-MCP visibility are properties of loop state, not scattered across router files.
Landing happens in small, independently reviewable PRs.
Non-goals
Rewriting the MCP crate or approval subsystem from scratch.
Changing storage schema.
Unifying every provider-specific stream parser.
Expanding approval workflow behavior (denial policy, etc.) as part of the architecture PRs.
Changing public API contracts for Responses requests or responses.
Prior validation
The full plan was implemented end-to-end once as a working prototype with contract-level OpenAI parity checks for non-streaming + streaming MCP flows including approval interrupts. Local smg ran against real OpenAI using the same test-plan style as #1174 and matched OpenAI's behavior at the contract level. This umbrella re-lands that proven design on main in reviewable pieces.
CallLlm — build the next upstream request from the canonical transcript and run one model turn.
ExecuteTools(Vec<PlannedToolExecution>) — execute gateway-owned tools only (MCP, builtins). User-defined function tools are not gateway-executed; they remain in the final response.
InterruptForApproval(PendingToolExecution) — render an approval interrupt response and return. A first-class loop action, not a special path outside the loop.
Finish — render the final response: restore client-facing tool view, inject visible MCP metadata, apply hidden-MCP filtering, patch previous_response_id.
Proposed shared module layout (lands with PR6)
model_gateway/src/routers/common/agent_loop/
mod.rs
prepared.rs # PreparedLoopInput and history-facing types
state.rs # AgentLoopState, LoopModelTurn, LoopToolCall, NextAction
driver.rs # run_agent_loop() and decide_next_action()
events.rs # semantic streaming events for adapters
tooling.rs # MCP execution planning and approval continuation helpers
Each surface keeps a thin adapter next to its router:
mcp_call always normalizes into function_call + function_call_output before upstream. When a replayed mcp_call carries an error, the error string must be surfaced via function_call_output.output rather than dropped.
mcp_list_tools dedupe is keyed on server_label. If an item was already emitted during streaming, the final response.completed payload must reuse the same id.
Upstream replay payloads never reintroduce client-visible-only control items (mcp_list_tools, mcp_approval_request).
effective_limit = min(user_max_tool_calls, DEFAULT_MAX_ITERATIONS) is a public behavior contract. Approved continuations obey the same budget.
Incomplete termination (tool-call limit) returns status=completed + incomplete_details.reason="max_tool_calls" for both streaming and non-streaming; streaming terminates with response.completed + [DONE], not a generic error event.
Stream sink does event translation only. It does not own loop-control decisions (when to call the model, when to execute tools, when a continuation is valid, when to interrupt).
Until shared extraction lands, surface routers must feed the normalized PreparedLoopInput.upstream_input into their RequestContext and restore store, previous_response_id, and conversation on top so persistence and response-metadata patching keep client-intent values.
cargo test --lib --package smg routers::openai::responses::
cargo test --test api_tests -- responses
pre-commit run --all-files
Manual parity against real OpenAI using the same style as feat(responses): interrupt approval-required MCP tool calls #1174's test plan: start local smg with --enable-igw --port 9999, register OpenAI as an external worker, compare contract-level output (item types, streaming event families, interrupt boundaries, error shapes) between SMG and direct OpenAI.
Implementation issues
Child issues will be linked into the shipping slices above as they are filed. Each PR body should reference Parent: #<this> and Closes #<UAL-PR-NN>.
Summary
Replace the three surface-specific Responses agent loops (OpenAI non-streaming + streaming, gRPC regular, gRPC harmony) with one shared agent loop used by every surface. Every Responses request enters the same loop, which decides the next action from state:
CallLlmExecuteTools(gateway-owned tools: MCP, builtins)InterruptForApprovalFinishA request with no MCP tools still enters the loop; it simply never produces an
ExecuteTools.Problem
Agentic control flow in the Responses routers is currently spread across layers:
max_tool_callsaccounting, visibility rules) creates another loop-external patch point instead of extending one explicit state machine.ResponsesRequest/ResponsesResponse, the upstream model payload, and the stored response chain used byprevious_response_idall flow through the same router code without being clearly separated. This already causes observable parity drift vs OpenAI — e.g. before refactor(responses): split load/prepare for canonical agent-loop input #1315 landed,previous_response_idreplay silently droppedmcp_list_tools/mcp_callitems and re-executed the tool.Goals
ResponsesRequest/ResponsesResponse(API + storage contract)Vec<ResponseInputOutputItem>(shared execution model across surfaces)ResponsesJSON,ChatCompletionRequest, Harmony pipeline input)max_tool_calls, approval interrupts,mcp_list_toolsdedupe, and hidden-MCP visibility are properties of loop state, not scattered across router files.Non-goals
Prior validation
The full plan was implemented end-to-end once as a working prototype with contract-level OpenAI parity checks for non-streaming + streaming MCP flows including approval interrupts. Local
smgran against real OpenAI using the same test-plan style as #1174 and matched OpenAI's behavior at the contract level. This umbrella re-lands that proven design onmainin reviewable pieces.Target end-to-end flow
NextAction semantics
CallLlm— build the next upstream request from the canonical transcript and run one model turn.ExecuteTools(Vec<PlannedToolExecution>)— execute gateway-owned tools only (MCP, builtins). User-defined function tools are not gateway-executed; they remain in the final response.InterruptForApproval(PendingToolExecution)— render an approval interrupt response and return. A first-class loop action, not a special path outside the loop.Finish— render the final response: restore client-facing tool view, inject visible MCP metadata, apply hidden-MCP filtering, patchprevious_response_id.Proposed shared module layout (lands with PR6)
Each surface keeps a thin adapter next to its router:
Key implementation invariants
mcp_callalways normalizes intofunction_call+function_call_outputbefore upstream. When a replayedmcp_callcarries anerror, the error string must be surfaced viafunction_call_output.outputrather than dropped.mcp_list_toolsdedupe is keyed onserver_label. If an item was already emitted during streaming, the finalresponse.completedpayload must reuse the sameid.mcp_list_tools,mcp_approval_request).effective_limit = min(user_max_tool_calls, DEFAULT_MAX_ITERATIONS)is a public behavior contract. Approved continuations obey the same budget.status=completed+incomplete_details.reason="max_tool_calls"for both streaming and non-streaming; streaming terminates withresponse.completed+[DONE], not a genericerrorevent.PreparedLoopInput.upstream_inputinto theirRequestContextand restorestore,previous_response_id, andconversationon top so persistence and response-metadata patching keep client-intent values.Shipping slices
Slice A — OpenAI reference implementation
Slice B — Shared abstractions
routers/common/agent_loop.Slice C — Other surfaces
Hardening
Validation gate (per PR)
cargo test -p openai-protocol --testscargo test --lib --package smg routers::openai::responses::cargo test --test api_tests -- responsespre-commit run --all-filessmgwith--enable-igw --port 9999, register OpenAI as an external worker, compare contract-level output (item types, streaming event families, interrupt boundaries, error shapes) between SMG and direct OpenAI.Implementation issues
Child issues will be linked into the shipping slices above as they are filed. Each PR body should reference
Parent: #<this>andCloses #<UAL-PR-NN>.