Skip to content

fix(sdk): drain broker stderr alongside stdout after startup#842

Merged
khaliqgant merged 4 commits into
mainfrom
fix/sdk-drain-broker-stderr
May 11, 2026
Merged

fix(sdk): drain broker stderr alongside stdout after startup#842
khaliqgant merged 4 commits into
mainfrom
fix/sdk-drain-broker-stderr

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

PR #838 added `drainBrokerStdoutAfterStartup` to keep the broker's stdout pipe drained after the SDK finishes parsing the startup URL. Stderr has the same failure mode but was never covered.

The Rust broker routes `tracing` output to stderr (`.claude/rules/rust.md`). Under heavy fanout (e.g. 9 PTY workers in proactive-runtime M1), tracing volume is large enough to fill stderr's ~64KB kernel pipe and block the broker process exactly the way stdout did before #838.

Why the existing readline isn't enough on its own

`AgentRelayClient.spawn` already attaches `createInterface({ input: child.stderr })` (`client.ts:248`) with a `'line'` listener that pushes into `stderrLines`. That keeps stderr in flowing mode during normal operation — most of the time. But:

  • readline parses lines synchronously; under sustained pressure it can fall behind the producer.
  • Boundary conditions (mid-line buffered flushes, the moment the URL parser releases stdout, etc.) can cause the stream to momentarily fall back to paused mode.

A no-op `'data'` listener + explicit `.resume()` gives a belt-and-suspenders guarantee the kernel pipe is always drained, regardless of readline's pace.

What changed

  • `packages/sdk/src/client.ts`
    • Rename `drainBrokerStdoutAfterStartup` → `drainBrokerStdioAfterStartup` (now accurate — covers both streams).
    • Iterate over `[child.stdout, child.stderr]` and attach the no-op drain handler + `.resume()` to each.
    • Update the single call site at `client.ts:264`.
  • `packages/sdk/src/tests/client-stdout-drain.test.ts`
    • Extract the fake-broker generator + drain-or-block harness into helpers parameterized by stream name.
    • Existing stdout test now floods the named stream; add a parallel test that floods stderr and asserts the broker exits within the same 5s window.
    • Bump per-test timeout to 30s so spawn + drain + child-exit has headroom past vitest's 5s default. The inner `BROKER_STDIO_DRAIN_TIMEOUT_MS` is still 5s — what's measured is the broker's exit timing, not the test wall-clock.

Verification

  • `npx vitest run packages/sdk/src/tests/client-stdout-drain.test.ts` — 2/2 pass (stdout drain + new stderr drain regression).

Related

  • relay#838 — original stdout-only drain
  • relay#841 — Rust broker writer-task fix (the upstream root cause)
  • ricky#94 / Dm messages #98 — Node-side child_process.spawn monkeypatch (backstop for consumers still on older SDKs)

Proactive Runtime Bot and others added 2 commits May 11, 2026 15:56
PR #838 added `drainBrokerStdoutAfterStartup` to keep the broker's
stdout pipe drained after the SDK finishes parsing the startup URL,
preventing the broker from blocking on `write()` when the parent
stops reading. **Stderr has the same failure mode** but was never
covered.

The Rust broker routes `tracing` output to stderr (`.claude/rules/rust.md`).
Under heavy fanout (e.g. 9 PTY workers in proactive-runtime M1), tracing
volume is large enough to fill stderr's ~64KB kernel pipe and block the
broker process exactly the way stdout did before #838.

Today the existing `createInterface({ input: child.stderr })` (around
client.ts:248) keeps stderr in flowing mode during normal operation by
attaching a `'line'` listener that pushes into `stderrLines`. That
works — except when the readline parser falls behind, or under
buffered-flush boundary conditions where the stream momentarily falls
back to paused mode. A no-op `'data'` listener + explicit `.resume()`
gives a belt-and-suspenders guarantee that stderr is always drained.

Changes
- `packages/sdk/src/client.ts`
  - Rename `drainBrokerStdoutAfterStartup` → `drainBrokerStdioAfterStartup`
    (it's now accurate — covers both streams).
  - Iterate over `[child.stdout, child.stderr]` and attach the no-op
    drain handler + `.resume()` to each.
  - Update the single call site at `client.ts:264` to the new name.
- `packages/sdk/src/__tests__/client-stdout-drain.test.ts`
  - Extract the fake-broker generator + the drain-or-block harness into
    helpers parameterized by stream name.
  - Existing stdout test now floods the named stream; add a parallel
    test that floods stderr and asserts the broker exits within the
    same 5s window.
  - Bump per-test timeout to 30s so spawn + drain + child-exit has
    headroom past vitest's 5s default (the inner BROKER_STDIO_DRAIN_TIMEOUT_MS
    is still 5s — what's measured is the broker's exit timing, not the
    test wall-clock).

Verification
- `npx vitest run src/__tests__/client-stdout-drain.test.ts` — **2/2 pass**

Related
- relay#838 — original stdout-only drain
- ricky#94 / #98 — Node-side child_process.spawn monkeypatch
  (backstop for consumers still on older SDKs)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extend the post-startup pipe drain to cover stderr as well as stdout.
The broker routes tracing output to stderr; under heavy fanout, stderr
fills the ~64KB kernel pipe and wedges the broker the same way stdout
did before PR #838. Rename drainBrokerStdoutAfterStartup to
drainBrokerStdioAfterStartup and add a stderr regression test that
reuses the existing flood harness.
@khaliqgant khaliqgant requested a review from willwashburn as a code owner May 11, 2026 13:57
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

AgentRelayClient.spawn() now drains both broker stdout and stderr after startup; tests refactor adds a fake broker generator, a shared runner/assertion helper, two spawn tests (stdout and stderr), and a unit test for the new drain helper.

Changes

Broker Stdio Draining

Layer / File(s) Summary
Test constants & imports
packages/sdk/src/__tests__/client-stdout-drain.test.ts
Adds centralized timing constants and imports (PassThrough, sleep) used for drain assertions and extended test timeouts.
Fake broker script generator
packages/sdk/src/__tests__/client-stdout-drain.test.ts
fakeBrokerSource(stream) emits a Node HTTP broker script that logs API URL, serves session endpoints, and floods the chosen stdio stream with backpressure-aware writes.
Runner helper to spawn and assert drains
packages/sdk/src/__tests__/client-stdout-drain.test.ts
runFakeBrokerAndAssertDrains(stream) writes the fake broker into a temp dir, spawns AgentRelayClient with increased startup timeout, and waits BROKER_STDIO_DRAIN_TIMEOUT_MS to classify broker as exited or blocked.
Unit test for drain helper
packages/sdk/src/__tests__/client-stdout-drain.test.ts
Adds a unit-style test using PassThrough streams to verify drainBrokerStdioAfterStartup attaches data listeners to both stdout and stderr and leaves them unpaused.
Spawn tests for stdout and stderr
packages/sdk/src/__tests__/client-stdout-drain.test.ts
Replaces single test with two Vitest cases that call runFakeBrokerAndAssertDrains for stdout and stderr floods, using TEST_TIMEOUT_MS.
Spawn call site update
packages/sdk/src/client.ts
AgentRelayClient.spawn() now calls drainBrokerStdioAfterStartup(child) instead of the removed stdout-only helper.
Dual stdio drain implementation
packages/sdk/src/client.ts
Adds drainBrokerStdioAfterStartup(child) which attaches no-op data handlers and resumes both child.stdout and child.stderr; old drainBrokerStdoutAfterStartup() removed and new helper exposed via __clientTestInternals.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • AgentWorkforce/relay#838: Earlier iteration that drained only stdout after startup; this PR extends it to drain both stdout and stderr.

Suggested reviewers

  • willwashburn
  • barryollama

Poem

I’m a rabbit with a tiny spade,
I dig the drains the devs have made. 🐇
Both stdout and stderr sing,
No more pipes that choke the thing.
Hooray—no blocked broker parade! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: extending pipe draining to cover both stdout and stderr after broker startup.
Description check ✅ Passed The description provides comprehensive context including the problem, solution, technical rationale, and verification. It follows the template structure with Summary and implicitly covers testing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sdk-drain-broker-stderr

Comment @coderabbitai help to get the list of available commands and usage tips.

Drop the broker-flood chunk count from 20000 (~20MB) to 5000 (~5MB) so
the line-by-line readline parsing of stderr in the SDK startup path
never races the drain assertion timeout. Still forces ~80 backpressure
cycles against the ~64KB macOS pipe buffer — more than enough to wedge
an undrained broker.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/sdk/src/__tests__/client-stdout-drain.test.ts (1)

80-81: 💤 Low value

Consider exposing child through __clientTestInternals for type-safe test access.

Accessing the private child property directly works at runtime (TypeScript's private is compile-time only), but it bypasses type checking. For consistency with how drainBrokerStdioAfterStartup is already exposed, consider adding the child process accessor to the test internals.

♻️ Suggested approach

In client.ts, extend the test internals:

export const __clientTestInternals = {
  drainBrokerStdioAfterStartup,
  getChild: (client: AgentRelayClient) => (client as any).child as ChildProcess | null,
};

Then in test:

-    const child = client.child;
+    const child = __clientTestInternals.getChild(client);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/sdk/src/__tests__/client-stdout-drain.test.ts` around lines 80 - 81,
Add a type-safe test accessor for the broker child process by extending the
existing __clientTestInternals export: add a getChild function that accepts an
AgentRelayClient and returns the child's type (ChildProcess | null) by safely
reading the private client.child internals; update client.ts to export
__clientTestInternals = { drainBrokerStdioAfterStartup, getChild } and update
tests to call __clientTestInternals.getChild(client) instead of accessing
client.child directly to preserve compile-time type checking.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/sdk/src/__tests__/client-stdout-drain.test.ts`:
- Around line 80-81: Add a type-safe test accessor for the broker child process
by extending the existing __clientTestInternals export: add a getChild function
that accepts an AgentRelayClient and returns the child's type (ChildProcess |
null) by safely reading the private client.child internals; update client.ts to
export __clientTestInternals = { drainBrokerStdioAfterStartup, getChild } and
update tests to call __clientTestInternals.getChild(client) instead of accessing
client.child directly to preserve compile-time type checking.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 957b592c-8209-4f05-8b76-2a835f7f547b

📥 Commits

Reviewing files that changed from the base of the PR and between 192b14c and 5c47ca0.

📒 Files selected for processing (2)
  • packages/sdk/src/__tests__/client-stdout-drain.test.ts
  • packages/sdk/src/client.ts

@khaliqgant khaliqgant merged commit f87277e into main May 11, 2026
41 checks passed
@khaliqgant khaliqgant deleted the fix/sdk-drain-broker-stderr branch May 11, 2026 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant