Skip to content

feat(kitchen-sink): add load testing harness#5017

Draft
NathanFlurry wants to merge 1 commit into
counter-latency/client-protocol-polishfrom
counter-latency/kitchen-sink-load-harness
Draft

feat(kitchen-sink): add load testing harness#5017
NathanFlurry wants to merge 1 commit into
counter-latency/client-protocol-polishfrom
counter-latency/kitchen-sink-load-harness

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 11, 2026

Code Review: feat(kitchen-sink): add load testing harness

Overview

This PR adds a load testing harness to the kitchen-sink example. Key changes:

  1. Refactors counter-latency.ts from a flag-based CLI to a subcommand CLI (rtt, concurrent, agent-concurrent) using node:util's parseArgs
  2. Adds two new actor implementations: tunnel-stress.ts (WebSocket echo/heartbeat) and load-test-agent.ts (SQLite-backed token streaming to simulate LLM inference)
  3. Adds build-push-kitchen-sink-local.sh for local workspace Docker builds
  4. Adds Dockerfile.local and updates CLAUDE.md with the new deploy flow

Bugs / Correctness Issues

ACTOR_STOPPED_CLOSE_CODE = 1000 reconnect detection is fragile

scripts/counter-latency.ts uses close code 1000 + reason string "hack_force_close" to detect actor restarts and trigger reconnect. Code 1000 is the generic Normal Closure code — any peer that closes cleanly will match code 1000, which could cause spurious reconnects. If this is meant to be a real signal, use a dedicated close code (e.g. 30003999 are application-reserved). The constant name literally says hack_force_close, which is a red flag.

openWebSocket fallback branch is dead code

In both makeTunnelStressWorkload and makeAgentWorkload:

const handle = actorId
    ? client.tunnelStress.getForId(actorId)
    : client.tunnelStress.getOrCreate([key]);

actorId is always resolved (non-falsy) before openWebSocket is called, so the getOrCreate branch is unreachable. Remove it or document why it's kept.

connectionCount can go negative

In both tunnel-stress.ts and load-test-agent.ts, connectionCount is decremented in the close handler:

websocket.addEventListener("close", () => {
    c.state.connectionCount -= 1;
});

If close fires without a matching open-phase increment (e.g. on error path), the counter goes below zero. Guard with Math.max(0, c.state.connectionCount - 1) or only decrement if connectionCount > 0.

Concurrent inference requests with no backpressure (load-test-agent.ts)

Each WebSocket message spawns a new concurrent inference IIFE wrapped in c.keepAwake(inference). With agent-concurrent running thousands of connections each sending on --message-interval-ms, many inferences can be active simultaneously on a single actor, all writing to SQLite. This may be intentional stress behavior, but it should be documented — an operator running this expecting "one inference at a time per actor" will be confused.


Performance Concerns

countWorkerHealth() is O(N) on every log line

function countWorkerHealth(): { pending: number; healthy: number; warning: number; ended: number } {
    for (const s of workerHealth.values()) { ... }
}

This iterates all workers on every logPrefix() call. With 1000 concurrent workers and high-frequency logging, this adds up. Prefer four AtomicInt-style counters that are incremented/decremented alongside workerHealth.set(...) calls.

KV write on every received message with a shared key (tunnel-stress.ts)

await c.kv.put("counter", String(c.state.messageCount));

The key "counter" is actor-scoped but shared across all connections to that actor. Under 1000+ concurrent connections each sending messages, this is a hot single KV key. If the point is to measure KV write throughput, document that; if the point is pure WebSocket stress, remove the KV write.

Heartbeat every 1s per actor instance (tunnel-stress.ts)

A 1-second heartbeat timer plus state mutations (c.state.heartbeatCount += 1) means every actor instance makes 1 state update + 1 KV write per second just from heartbeats. At 1000 actors this is 1000 state writes/sec plus the message-reply writes — worth documenting as an explicit design point of the stress test rather than implied overhead.


Code Quality

_worker parameter in logPrefix is unused for RTT mode

logPrefix(_worker: number) prefixes the _ indicating the parameter is intentionally unused. In rtt mode the prefix shows no per-worker info — if per-worker info is not useful, the parameter can be removed entirely.

Inconsistent config source: RUN_FOR_MS is an env var, everything else is a CLI flag

RUN_FOR_MS is read from process.env while all other configuration is now parsed from argv. This is inconsistent. Either expose --run-for-ms as a proper flag (and add it to usage), or document clearly that it's intentionally env-only.

parseArgs return type annotation is verbose

The explicit ReturnType<typeof parseArgs<...>> annotations on parsed in parseRttArgs and parseConcurrentArgs are unnecessary since TypeScript can infer the type from the parseArgs(...) call directly. This makes the code quite noisy without adding clarity.

Magic WebSocket readyState numbers

ws.readyState === 1 and ws.readyState <= 1 appear multiple times. Consider a small named constant:

const WS_OPEN = 1;

any in TunnelWebSocket.addEventListener

addEventListener(type: ..., listener: (event: any) => void, ...): void;

The event: any can be tightened — at minimum MessageEvent for message, CloseEvent for close, etc. since these are test-harness interfaces used throughout the script.


Build Script Notes (build-push-kitchen-sink-local.sh)

  • rm -rf "${CONTEXT_DIR}" immediately after creating it via mktemp is a TOCTOU footgun if CONTEXT_DIR is user-supplied and points at real data. The intent is clear but worth a comment.
  • The inline node <<'NODE' heredoc modifies package.json in-place. This works but is hard to test in isolation; a standalone .js helper would be easier to iterate on.
  • napi_binaries=("${ROOT_DIR}"/rivetkit-typescript/packages/rivetkit-napi/*.node) — if multiple .node files exist (e.g. from multiple builds for different platforms), they're all copied. A note about single-platform expectation would help.
  • pnpm install --no-frozen-lockfile is needed but means the produced node_modules could diverge from what CI verifies. This is an acceptable tradeoff for a local-build flow but worth noting in the script comments.

Minor / Style

  • gradient: ... → changed to gradient: ... -> in the output — intentional (avoids arrow Unicode in terminals that don't render it)? Fine either way, just flag it as intentional.
  • The Dockerfile.local CMD line runs with tsx rather than pre-compiled JS, which means transpile overhead at startup. This is probably acceptable for a staging/test image.

Summary

The overall direction is solid — the CLI refactor is a clear improvement, and the new actors give useful stress-test coverage for tunnel connections and SQLite-backed inference workloads. The main items to address before merging:

  1. Fix or document the code=1000 reconnect detection (currently fragile and named hack_force_close)
  2. Remove the dead getOrCreate fallback in openWebSocket
  3. Guard connectionCount from going negative
  4. Replace the O(N) countWorkerHealth() scan with per-state counters
  5. Address the inconsistency of RUN_FOR_MS as env-only vs everything else as CLI flags

The draft status is appropriate — good work to review before opening for merge.

@MasterPtato MasterPtato force-pushed the counter-latency/client-protocol-polish branch from ec783c3 to 6127de0 Compare May 14, 2026 17:26
@MasterPtato MasterPtato force-pushed the counter-latency/kitchen-sink-load-harness branch from 5caf7f6 to 31a9bd4 Compare May 14, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant