feat(kitchen-sink): add load testing harness by NathanFlurry · Pull Request #5017 · rivet-dev/rivet

NathanFlurry · 2026-05-11T12:05:45Z

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

NathanFlurry · 2026-05-11T12:06:00Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-05-11T12:11:13Z

Code Review: feat(kitchen-sink): add load testing harness

Overview

This PR adds a load testing harness to the kitchen-sink example. Key changes:

Refactors counter-latency.ts from a flag-based CLI to a subcommand CLI (rtt, concurrent, agent-concurrent) using node:util's parseArgs
Adds two new actor implementations: tunnel-stress.ts (WebSocket echo/heartbeat) and load-test-agent.ts (SQLite-backed token streaming to simulate LLM inference)
Adds build-push-kitchen-sink-local.sh for local workspace Docker builds
Adds Dockerfile.local and updates CLAUDE.md with the new deploy flow

Bugs / Correctness Issues

ACTOR_STOPPED_CLOSE_CODE = 1000 reconnect detection is fragile

scripts/counter-latency.ts uses close code 1000 + reason string "hack_force_close" to detect actor restarts and trigger reconnect. Code 1000 is the generic Normal Closure code — any peer that closes cleanly will match code 1000, which could cause spurious reconnects. If this is meant to be a real signal, use a dedicated close code (e.g. 3000–3999 are application-reserved). The constant name literally says hack_force_close, which is a red flag.

openWebSocket fallback branch is dead code

In both makeTunnelStressWorkload and makeAgentWorkload:

const handle = actorId
    ? client.tunnelStress.getForId(actorId)
    : client.tunnelStress.getOrCreate([key]);

actorId is always resolved (non-falsy) before openWebSocket is called, so the getOrCreate branch is unreachable. Remove it or document why it's kept.

connectionCount can go negative

In both tunnel-stress.ts and load-test-agent.ts, connectionCount is decremented in the close handler:

websocket.addEventListener("close", () => {
    c.state.connectionCount -= 1;
});

If close fires without a matching open-phase increment (e.g. on error path), the counter goes below zero. Guard with Math.max(0, c.state.connectionCount - 1) or only decrement if connectionCount > 0.

Concurrent inference requests with no backpressure (load-test-agent.ts)

Each WebSocket message spawns a new concurrent inference IIFE wrapped in c.keepAwake(inference). With agent-concurrent running thousands of connections each sending on --message-interval-ms, many inferences can be active simultaneously on a single actor, all writing to SQLite. This may be intentional stress behavior, but it should be documented — an operator running this expecting "one inference at a time per actor" will be confused.

Performance Concerns

countWorkerHealth() is O(N) on every log line

function countWorkerHealth(): { pending: number; healthy: number; warning: number; ended: number } {
    for (const s of workerHealth.values()) { ... }
}

This iterates all workers on every logPrefix() call. With 1000 concurrent workers and high-frequency logging, this adds up. Prefer four AtomicInt-style counters that are incremented/decremented alongside workerHealth.set(...) calls.

KV write on every received message with a shared key (tunnel-stress.ts)

await c.kv.put("counter", String(c.state.messageCount));

The key "counter" is actor-scoped but shared across all connections to that actor. Under 1000+ concurrent connections each sending messages, this is a hot single KV key. If the point is to measure KV write throughput, document that; if the point is pure WebSocket stress, remove the KV write.

Heartbeat every 1s per actor instance (tunnel-stress.ts)

A 1-second heartbeat timer plus state mutations (c.state.heartbeatCount += 1) means every actor instance makes 1 state update + 1 KV write per second just from heartbeats. At 1000 actors this is 1000 state writes/sec plus the message-reply writes — worth documenting as an explicit design point of the stress test rather than implied overhead.

Code Quality

_worker parameter in logPrefix is unused for RTT mode

logPrefix(_worker: number) prefixes the _ indicating the parameter is intentionally unused. In rtt mode the prefix shows no per-worker info — if per-worker info is not useful, the parameter can be removed entirely.

Inconsistent config source: RUN_FOR_MS is an env var, everything else is a CLI flag

RUN_FOR_MS is read from process.env while all other configuration is now parsed from argv. This is inconsistent. Either expose --run-for-ms as a proper flag (and add it to usage), or document clearly that it's intentionally env-only.

parseArgs return type annotation is verbose

The explicit ReturnType<typeof parseArgs<...>> annotations on parsed in parseRttArgs and parseConcurrentArgs are unnecessary since TypeScript can infer the type from the parseArgs(...) call directly. This makes the code quite noisy without adding clarity.

Magic WebSocket readyState numbers

ws.readyState === 1 and ws.readyState <= 1 appear multiple times. Consider a small named constant:

const WS_OPEN = 1;

any in TunnelWebSocket.addEventListener

addEventListener(type: ..., listener: (event: any) => void, ...): void;

The event: any can be tightened — at minimum MessageEvent for message, CloseEvent for close, etc. since these are test-harness interfaces used throughout the script.

Build Script Notes (`build-push-kitchen-sink-local.sh`)

rm -rf "${CONTEXT_DIR}" immediately after creating it via mktemp is a TOCTOU footgun if CONTEXT_DIR is user-supplied and points at real data. The intent is clear but worth a comment.
The inline node <<'NODE' heredoc modifies package.json in-place. This works but is hard to test in isolation; a standalone .js helper would be easier to iterate on.
napi_binaries=("${ROOT_DIR}"/rivetkit-typescript/packages/rivetkit-napi/*.node) — if multiple .node files exist (e.g. from multiple builds for different platforms), they're all copied. A note about single-platform expectation would help.
pnpm install --no-frozen-lockfile is needed but means the produced node_modules could diverge from what CI verifies. This is an acceptable tradeoff for a local-build flow but worth noting in the script comments.

Minor / Style

gradient: ... → changed to gradient: ... -> in the output — intentional (avoids arrow Unicode in terminals that don't render it)? Fine either way, just flag it as intentional.
The Dockerfile.local CMD line runs with tsx rather than pre-compiled JS, which means transpile overhead at startup. This is probably acceptable for a staging/test image.

Summary

The overall direction is solid — the CLI refactor is a clear improvement, and the new actors give useful stress-test coverage for tunnel connections and SQLite-backed inference workloads. The main items to address before merging:

Fix or document the code=1000 reconnect detection (currently fragile and named hack_force_close)
Remove the dead getOrCreate fallback in openWebSocket
Guard connectionCount from going negative
Replace the O(N) countWorkerHealth() scan with per-state counters
Address the inconsistency of RUN_FOR_MS as env-only vs everything else as CLI flags

The draft status is appropriate — good work to review before opening for merge.

This was referenced May 11, 2026

WIP: counter latency monitor #5011

Draft

feat(rivetkit-core): expose metrics endpoint #5013

Draft

fix(sqlite): fence remote requests by generation #5014

Draft

fix(runtime): avoid bounded dispatch channel stalls #5015

Draft

This was referenced May 11, 2026

fix(rivetkit): expose client actor metadata #5016

Draft

docs(serverless): document timeout tuning #5018

Draft

This was referenced May 11, 2026

feat(rivetkit): expose low-cardinality metrics routes #5019

Draft

feat(rivetkit): add current actor metrics #5020

Draft

fix(rivetkit): require engine ping for health #5022

Draft

abcxff mentioned this pull request May 11, 2026

temp: patch wasm-pack to point to latest binary #5037

Draft

11 tasks

MasterPtato mentioned this pull request May 12, 2026

fix(envoy-client): dont abort drain after 20s #5043

Draft

11 tasks

feat(kitchen-sink): add load testing harness

31a9bd4

MasterPtato force-pushed the counter-latency/client-protocol-polish branch from ec783c3 to 6127de0 Compare May 14, 2026 17:26

MasterPtato force-pushed the counter-latency/kitchen-sink-load-harness branch from 5caf7f6 to 31a9bd4 Compare May 14, 2026 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kitchen-sink): add load testing harness#5017

feat(kitchen-sink): add load testing harness#5017
NathanFlurry wants to merge 1 commit into
counter-latency/client-protocol-polishfrom
counter-latency/kitchen-sink-load-harness

NathanFlurry commented May 11, 2026

Uh oh!

NathanFlurry commented May 11, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NathanFlurry commented May 11, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

NathanFlurry commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: feat(kitchen-sink): add load testing harness

Overview

Bugs / Correctness Issues

Performance Concerns

Code Quality

Build Script Notes (build-push-kitchen-sink-local.sh)

Minor / Style

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NathanFlurry commented May 11, 2026 •

edited

Loading

claude Bot commented May 11, 2026 •

edited

Loading

Build Script Notes (`build-push-kitchen-sink-local.sh`)