fix(sdk): drain broker stderr alongside stdout after startup#842
Conversation
PR #838 added `drainBrokerStdoutAfterStartup` to keep the broker's stdout pipe drained after the SDK finishes parsing the startup URL, preventing the broker from blocking on `write()` when the parent stops reading. **Stderr has the same failure mode** but was never covered. The Rust broker routes `tracing` output to stderr (`.claude/rules/rust.md`). Under heavy fanout (e.g. 9 PTY workers in proactive-runtime M1), tracing volume is large enough to fill stderr's ~64KB kernel pipe and block the broker process exactly the way stdout did before #838. Today the existing `createInterface({ input: child.stderr })` (around client.ts:248) keeps stderr in flowing mode during normal operation by attaching a `'line'` listener that pushes into `stderrLines`. That works — except when the readline parser falls behind, or under buffered-flush boundary conditions where the stream momentarily falls back to paused mode. A no-op `'data'` listener + explicit `.resume()` gives a belt-and-suspenders guarantee that stderr is always drained. Changes - `packages/sdk/src/client.ts` - Rename `drainBrokerStdoutAfterStartup` → `drainBrokerStdioAfterStartup` (it's now accurate — covers both streams). - Iterate over `[child.stdout, child.stderr]` and attach the no-op drain handler + `.resume()` to each. - Update the single call site at `client.ts:264` to the new name. - `packages/sdk/src/__tests__/client-stdout-drain.test.ts` - Extract the fake-broker generator + the drain-or-block harness into helpers parameterized by stream name. - Existing stdout test now floods the named stream; add a parallel test that floods stderr and asserts the broker exits within the same 5s window. - Bump per-test timeout to 30s so spawn + drain + child-exit has headroom past vitest's 5s default (the inner BROKER_STDIO_DRAIN_TIMEOUT_MS is still 5s — what's measured is the broker's exit timing, not the test wall-clock). Verification - `npx vitest run src/__tests__/client-stdout-drain.test.ts` — **2/2 pass** Related - relay#838 — original stdout-only drain - ricky#94 / #98 — Node-side child_process.spawn monkeypatch (backstop for consumers still on older SDKs) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extend the post-startup pipe drain to cover stderr as well as stdout. The broker routes tracing output to stderr; under heavy fanout, stderr fills the ~64KB kernel pipe and wedges the broker the same way stdout did before PR #838. Rename drainBrokerStdoutAfterStartup to drainBrokerStdioAfterStartup and add a stderr regression test that reuses the existing flood harness.
📝 WalkthroughWalkthroughAgentRelayClient.spawn() now drains both broker stdout and stderr after startup; tests refactor adds a fake broker generator, a shared runner/assertion helper, two spawn tests (stdout and stderr), and a unit test for the new drain helper. ChangesBroker Stdio Draining
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Drop the broker-flood chunk count from 20000 (~20MB) to 5000 (~5MB) so the line-by-line readline parsing of stderr in the SDK startup path never races the drain assertion timeout. Still forces ~80 backpressure cycles against the ~64KB macOS pipe buffer — more than enough to wedge an undrained broker.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/sdk/src/__tests__/client-stdout-drain.test.ts (1)
80-81: 💤 Low valueConsider exposing
childthrough__clientTestInternalsfor type-safe test access.Accessing the private
childproperty directly works at runtime (TypeScript'sprivateis compile-time only), but it bypasses type checking. For consistency with howdrainBrokerStdioAfterStartupis already exposed, consider adding the child process accessor to the test internals.♻️ Suggested approach
In
client.ts, extend the test internals:export const __clientTestInternals = { drainBrokerStdioAfterStartup, getChild: (client: AgentRelayClient) => (client as any).child as ChildProcess | null, };Then in test:
- const child = client.child; + const child = __clientTestInternals.getChild(client);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sdk/src/__tests__/client-stdout-drain.test.ts` around lines 80 - 81, Add a type-safe test accessor for the broker child process by extending the existing __clientTestInternals export: add a getChild function that accepts an AgentRelayClient and returns the child's type (ChildProcess | null) by safely reading the private client.child internals; update client.ts to export __clientTestInternals = { drainBrokerStdioAfterStartup, getChild } and update tests to call __clientTestInternals.getChild(client) instead of accessing client.child directly to preserve compile-time type checking.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/sdk/src/__tests__/client-stdout-drain.test.ts`:
- Around line 80-81: Add a type-safe test accessor for the broker child process
by extending the existing __clientTestInternals export: add a getChild function
that accepts an AgentRelayClient and returns the child's type (ChildProcess |
null) by safely reading the private client.child internals; update client.ts to
export __clientTestInternals = { drainBrokerStdioAfterStartup, getChild } and
update tests to call __clientTestInternals.getChild(client) instead of accessing
client.child directly to preserve compile-time type checking.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 957b592c-8209-4f05-8b76-2a835f7f547b
📒 Files selected for processing (2)
packages/sdk/src/__tests__/client-stdout-drain.test.tspackages/sdk/src/client.ts
Summary
PR #838 added `drainBrokerStdoutAfterStartup` to keep the broker's stdout pipe drained after the SDK finishes parsing the startup URL. Stderr has the same failure mode but was never covered.
The Rust broker routes `tracing` output to stderr (`.claude/rules/rust.md`). Under heavy fanout (e.g. 9 PTY workers in proactive-runtime M1), tracing volume is large enough to fill stderr's ~64KB kernel pipe and block the broker process exactly the way stdout did before #838.
Why the existing readline isn't enough on its own
`AgentRelayClient.spawn` already attaches `createInterface({ input: child.stderr })` (`client.ts:248`) with a `'line'` listener that pushes into `stderrLines`. That keeps stderr in flowing mode during normal operation — most of the time. But:
A no-op `'data'` listener + explicit `.resume()` gives a belt-and-suspenders guarantee the kernel pipe is always drained, regardless of readline's pace.
What changed
Verification
Related