Skip to content

Add livekit transport opt-in alongside aiortc#119

Draft
nagar-decart wants to merge 16 commits into
mainfrom
nagar-decart/livekit-transport
Draft

Add livekit transport opt-in alongside aiortc#119
nagar-decart wants to merge 16 commits into
mainfrom
nagar-decart/livekit-transport

Conversation

@nagar-decart
Copy link
Copy Markdown
Contributor

@nagar-decart nagar-decart commented Apr 19, 2026

Summary

  • New transports/livekit.tsLiveKitConnection with the same public surface as WebRTCConnection (connect/send/cleanup/getPeerConnection/websocketMessagesEmitter/setImageBase64/state)
  • WebRTCManager gains a transport?: "aiortc" | "livekit" option; default is "aiortc" (fully back-compat)
  • RealTimeClientConnectOptions threads transport through; control-WS behavior is identical for both transports — only the media handshake differs (livekit_joinlivekit_room_infoRoom.connect)
  • Adds livekit-client ^2.0.0 dep

Pairs with the api branch that lands side-by-side aiortc + livekit on the inference server + bouncer + k8s helm chart; details in the plan file alongside that PR.

Test plan

  • pnpm typecheck clean
  • pnpm test — 145/145 existing unit tests pass
  • End-to-end: createRealTimeClient({ transport: "livekit" }) against a local livekit-server deployed via the api repo's just init once the api PR lands
  • E2E against prod-usw2 LiveKit SFU after infra rollout

🤖 Generated with Claude Code


Note

Medium Risk
Adds an alternate realtime media transport and threads a new transport option through connection setup, which can impact connection reliability and state handling. Default remains aiortc, but the new LiveKit handshake and dependency surface introduces integration risk with server-side support.

Overview
Adds an opt-in LiveKit-based realtime transport alongside the existing aiortc flow, selectable per realtime.connect() call via a new transport: "aiortc" | "livekit" option (defaulting to aiortc).

Implements LiveKitConnection to mirror the existing connection interface while switching the media handshake to a livekit_joinlivekit_room_info control-WS exchange followed by joining/publishing to a LiveKit SFU room; WebRTCManager now instantiates the appropriate transport. The SDK test index.html is updated to allow choosing the transport, and livekit-client is added as a dependency.

Reviewed by Cursor Bugbot for commit edb61bd. Bugbot is set up for automated code reviews on this repo. Configure here.

Side-by-side WebRTC transport support for the inference server's new
livekit path. aiortc stays the default and is fully back-compat.

- packages/sdk/src/realtime/transports/livekit.ts: new
  LiveKitConnection. Public surface (connect/send/cleanup/
  getPeerConnection/websocketMessagesEmitter/setImageBase64/state)
  matches WebRTCConnection so WebRTCManager can swap implementations.
  Control WS is identical (prompt / set_image / session_id / tick acks);
  the only differences are the media handshake (livekit_join →
  livekit_room_info, then Room.connect + publishTrack).

- packages/sdk/src/realtime/transports/index.ts: shared TransportKind
  type + re-exports.

- packages/sdk/src/realtime/webrtc-manager.ts: gains an optional
  transport: "aiortc" | "livekit" field in WebRTCConfig. The constructor
  dispatches to LiveKitConnection when opted in, WebRTCConnection
  otherwise. All manager state machine logic (reconnect, buffer, emit)
  is transport-agnostic.

- packages/sdk/src/realtime/client.ts: RealTimeClientConnectOptions now
  accepts `transport`; it's threaded into the manager config.

- package.json: adds livekit-client ^2.0.0.

Typecheck passes; all 145 existing unit tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rejectConnect = reject;
});
connectAbort.catch(() => {});
this.connectionReject = (error) => rejectConnect(error);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connectAbort promise created but never raced

High Severity

The connectAbort promise is created and this.connectionReject is wired to reject it, but unlike WebRTCConnection (which uses Promise.race([..., connectAbort]) for every phase), the LiveKitConnection never races connectAbort against openControlWs, requestRoomInfo, joinRoom, or sendInitialPrompt. If the control WebSocket closes during these operations, the rejection is silently swallowed by connectAbort.catch(() => {}), and the connection flow won't abort until individual timeouts expire (up to 30 seconds).

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ac1e2f0. Configure here.

// Phase 3 — optional: send initial prompt over control WS.
if (this.callbacks.initialPrompt) {
await this.sendInitialPrompt(this.callbacks.initialPrompt);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial image silently dropped in LiveKit transport

High Severity

The initialImage field is declared in LiveKitCallbacks but never sent during connect(). The aiortc transport sends the initial image via setImageBase64 before the media handshake, but the LiveKit transport only checks this.callbacks.initialPrompt and ignores this.callbacks.initialImage. Users providing an initialImage with the LiveKit transport will have it silently discarded.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ac1e2f0. Configure here.

type: "prompt",
prompt: prompt.text,
enhance: prompt.enhance ?? false,
} as unknown as OutgoingWebRTCMessage;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong field name and default in sendInitialPrompt

Medium Severity

The sendInitialPrompt method sends enhance with a default of false, but the server protocol (defined in types.ts as PromptMessage) expects the field enhance_prompt, and the aiortc transport defaults to true. This means the server won't recognize the enhance flag, and behavior will differ from the aiortc transport even when the same options are passed.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ac1e2f0. Configure here.

Comment thread packages/sdk/src/realtime/transports/livekit.ts
index.html now has aiortc | livekit radios that feed
realtime.connect({ transport }), so the dev demo at sdk.decart.local
can flip between the two transports without a code change. Default
stays aiortc so existing sanity tests are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit edb61bd. Configure here.

this.websocketMessagesEmitter.emit("generationTick", typed);
break;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server error messages silently dropped after initial handshake

Medium Severity

handleControlMessage has no case for error type messages in its switch statement. After requestRoomInfo() completes and pendingRoomInfoResolvers is empty, server-sent error messages (e.g. insufficient credits, session rejected) are silently ignored. The aiortc transport's handleSignalingMessage always calls callbacks.onError for error messages, ensuring the SDK user is notified.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit edb61bd. Configure here.

@nagar-decart nagar-decart marked this pull request as draft April 19, 2026 14:38
nagar-decart and others added 14 commits April 20, 2026 09:36
Inference server gained an opt-in periodic `{"type": "server_metrics"}`
WS emission (DecartAI/api PR forthcoming) that the webrtc-bench tool
subscribes to for per-session fps / latency / queue-depth numbers.
Surface it through the SDK so consumers can do:

    rtClient.on("serverMetrics", (msg) => ...)

Changes:
- types.ts: new ServerMetricsMessage type; added to IncomingWebRTCMessage.
- webrtc-connection.ts (aiortc): parse `type: "server_metrics"` and emit
  on the internal websocketMessagesEmitter.
- transports/livekit.ts: same, inside handleControlMessage switch.
- client.ts: add `serverMetrics` to public Events, wire the listener so
  the internal emitter fans out to the public RealTimeClient.on surface.

Default off — the server only emits when the client's realtime URL has
`?emit_server_metrics=1`. Normal SDK consumers see nothing unless they
explicitly opt in.

Typecheck passes; 145/145 unit tests still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Forwards the inference server's E2E pixel-latency handshake (message
type "marker_config") to SDK consumers. Symmetric with serverMetrics —
opt-in via ?pixel_latency=1 on the realtime WS URL.

The webrtc-bench tool uses this to align its PixelMarkerReader's search
window with the server's actual stamp dimensions (which can differ from
the client stamp dims when the server transcodes). Normal consumers
ignore the event.

- types.ts: MarkerConfigMessage + add to IncomingWebRTCMessage union.
- webrtc-connection.ts + transports/livekit.ts: parse type == "marker_config"
  and emit on the transport's websocketMessagesEmitter.
- client.ts: expose as a public markerConfig event on RealTimeClient,
  via the same emitOrBuffer path as serverMetrics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E2E pixel-latency no longer negotiates stamp dimensions between client
and server — both sides use a fixed protocol and auto-detect the
received scale. The marker_config WS message is gone, so drop the
MarkerConfigMessage type and the event plumbing across client.ts,
webrtc-connection.ts, transports/livekit.ts, and types.ts.

Reverts the prior markerConfig addition on this branch; the webrtc-bench
tool in api#1095 handles scale detection inside its PixelMarkerReader.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rate

Two fixes that let non-aiortc transports see the same `stats` event stream
and that keep the reported outbound bitrate sensible under simulcast:

1. Transport-agnostic stats source.

   Introduce `StatsProvider`: `{ getStats(): Promise<RTCStatsReport> }`.
   `RTCPeerConnection` already satisfies it (aiortc path, back-compat);
   LiveKitConnection now supplies an aggregator that walks every local
   and remote track in the Room, calls `track.getRTCStatsReport()`, and
   merges the per-track reports into one RTCStatsReport-shaped Map.
   That's the minimum surface `WebRTCStatsCollector.parse()` needs — it
   iterates with `.forEach` and keys off `report.type`.

   Before: LiveKitConnection.getPeerConnection() returned null, so the
   SDK never started its stats collector for livekit sessions and no
   `stats` events fired. Now livekit sessions emit stats on the same
   cadence (and with the same payload shape) as aiortc.

   Client code (`startStatsCollection` / `handleConnectionStateChange`)
   now consults `manager.getStatsProvider()` instead of
   `manager.getPeerConnection()`. The identity check (so we don't
   restart the collector on every state change) still works because
   both the provider and the PC are stable references per connection.

2. Simulcast-safe outbound bitrate.

   Simulcast emits one `outbound-rtp` report per spatial layer (3 layers
   is typical). The parser used to overwrite `outboundVideo` with
   whichever layer `forEach` visited last — each layer has its own
   `bytesSent` counter, so across ticks the "last visited" layer would
   alternate and `bytesSent - prevBytesSentVideo` went violently
   negative. We saw `bitrateOutKbps` down to -6589 in bench results.

   Accumulate `bytesSent` + `packetsSent` across every outbound-rtp
   video report; compute the bitrate once, after the forEach, against
   the summed total. Also clamp the result to `Math.max(0, ...)` since
   `bytesSent` can transiently drop when tracks are added/removed
   mid-session (new simulcast layer ramping up, publisher swap).

   For scalar fields (resolution, fps, qualityLimitationReason), pick
   the highest-resolution active layer so reported frame dimensions
   match what's on the wire.

Verified against staging: 3-region x 2-transport smoke produces 0
negative `bitrateOutKbps` samples and livekit scenarios now report
bitrate/fps/rtt/jitter/resolution alongside aiortc.
Bench callers (and presumably other stats consumers) need to know which
ICE candidate path the current session is using — relayed TURN vs
direct UDP, the local/remote IPs and port, the transport protocol.
That signal disappeared when an earlier refactor projected the parser's
output down to just `currentRoundTripTime` + `availableOutgoingBitrate`
on `connection`.

Restore it:

- `WebRTCStats.connection.selectedCandidatePairs: Array<{ local, remote }>`
  exposing { candidateType, address, port, protocol } per side.
- Parser now collects `localCandidateId` / `remoteCandidateId` from
  succeeded candidate-pair reports and, after the main forEach, looks
  each ID up in rawStats to produce the resolved pair (rawStats entry
  order isn't guaranteed — the pair may appear before its referenced
  candidates).
- Handles both the older `ip` and newer `address` fields on
  `local-candidate` / `remote-candidate` reports.

Net effect: bench's `SdkStatsCollector.onStats` (which already
defensively reads `stats.connection.selectedCandidatePairs`) will now
populate `iceCandidate` for every session. Before this change, that
field was always undefined under the SDK transport, so every bench
run logged `iceCandidate: None` and diagnosing relay vs direct
sessions was impossible.
Consumers (benchmark/observability) need the full set of fields that
the WebRTC spec exposes via `RTCInboundRtpStreamStats` /
`RTCOutboundRtpStreamStats` / `RemoteInboundRtpStreamStats`. The SDK's
parser previously projected those down to a small curated set
(bitrate, fps, jitter, freezes) and dropped everything diagnostic —
so downstream code that tried to read e.g. `stats.video.avgJitterBufferMs`
silently got undefined for months.

Restored fields (inbound video):
- framesReceived, keyFramesDecoded
- nackCount, nackCountDelta, pliCount, firCount
- avgDecodeTimeMs (totalDecodeTime / framesDecoded)
- avgProcessingDelayMs (totalProcessingDelay / framesDecoded)
- avgJitterBufferMs (jitterBufferDelay / jitterBufferEmittedCount)
- avgInterFrameDelayMs (totalInterFrameDelay / framesDecoded)
- interFrameDelayVarianceMs (σ from total+totalSquared — tells you
  how much the decoder's inter-frame arrival is jittering)
- jitterBufferTargetDelayMs, jitterBufferMinimumDelayMs (current
  target vs minimum buffer depth — answers "is Chrome running a
  deep adaptive buffer?")
- decoderImplementation

Restored fields (outbound video):
- targetBitrateKbps (BWE's current target — separate from the
  actual-bytes-sent-derived `bitrate` field)
- avgEncodeTimeMs, avgPacketSendDelayMs, avgQp
- nackCount, pliCount, firCount (received from remote — recovery
  request counters)
- retransmittedBytesSent, retransmittedPacketsSent
- encoderImplementation

New block:
- `remoteInbound { fractionLost, jitter, roundTripTime }` from the
  remote-inbound-rtp report. Tells you "what does the remote side
  think about its reception of our outbound" — independent of our
  own observations.

Simulcast aggregation unchanged: the outbound-rtp block still
accumulates per-spatial-layer totals for bytesSent/packetsSent/retransmit
counters, picks scalar fields (resolution, fps, quality-limit,
targetBitrate, avgEncodeTime, encoderImplementation) from the
highest-resolution layer.

All derived averages return null instead of 0 when the denominator is
0 (before any frames decode). Avoids the ambiguity of `avg = 0` meaning
either "genuinely instant" or "no samples yet".

Unblocks bench-side diagnosis of bimodal session behavior: the jitter
buffer depth + inter-frame delay variance + targetBitrate signals,
together, let you tell whether a bad session is running with a deep
receive buffer, irregular decoder input timing, or a BWE that didn't
adapt — each of which points to a different root cause.
…x bitrate)

Callers (benchmark tool) need per-session control over the client-side
livekit publisher. The simulcast flag and maxBitrate directly affect
how the SFU routes a client's uplink — until now both were hardcoded
(simulcast=true, no explicit maxBitrate → Chrome BWE picks the rate).

New `realtime.connect` options:
- `livekitPublishSimulcast`   — forwarded to `publishTrack(simulcast)`
- `livekitPublishMaxBitrateKbps` — forwarded as
  `videoEncoding.maxBitrate` (kbps → bps) on publishTrack

Plumbing: RealTimeClient schema → WebRTCManager.config → LiveKitCallbacks
→ publishTrack opts. Aiortc ignores both.

Log: LiveKitConnection.joinRoom now emits a single info log with the
effective publish config (simulcast + maxBitrate) at connect time so
bench logs can be grepped to confirm the values that actually took
effect.
…tiveStream + dynacast

- publishMaxBitrateKbps: undefined → 2500 kbps (matches server default);
  pass `null` to explicitly opt out and let Chrome BWE run uncapped.
  Three-state semantic preserved end-to-end (zod schema → WebRTCManager
  → LiveKitConnection).
- adaptiveStream + dynacast: new configurable callback fields on
  LiveKitConnection, plumbed through WebRTCConfig as
  `livekitAdaptiveStream` / `livekitDynacast`. Both still default to
  `false` (unchanged behavior). Primary consumer is webrtc-bench, which
  sweeps these for LiveKit quality experiments without forking the SDK.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Request camera in portrait (swapped w/h + facingMode: user) on mobile
- Pass createConsoleLogger('info') so LiveKit logs actually print

Made-with: Cursor
- createDemoLogger forwards LiveKit/SDK logs to on-page Console Logs + DevTools
- Detect mobile via min viewport edge (landscape phones) and ideal constraints
- Retry getUserMedia with looser portrait constraints on over-constraint failure

Made-with: Cursor
…rt log

- Replace fragile viewport-size heuristic with touch + coarse-pointer check
  so landscape phones stay in portrait mode and laptops stay in landscape.
- Log "WebRTC transport selected" in WebRTCManager constructor so consumers
  can verify the logger pipeline regardless of transport or handshake outcome.

Made-with: Cursor
…xFramerate, degradationPreference)

Three new options surface livekit-client's publishTrack capabilities
we previously hid behind hardcoded defaults:

  livekitPublishCodec           — vp8 | vp9 | h264 | av1
                                  Pin client-side publish codec. Without
                                  this, the browser chose codec on its
                                  own (typically VP8), making "H264
                                  variant" tests asymmetric (H264
                                  server-out, VP8/VP9 client-in).
  livekitPublishMaxFramerate    — default 30 (was hardcoded)
                                  Symmetric with existing serverMaxFramerate.
  livekitDegradationPreference  — balanced | maintain-framerate | maintain-resolution
                                  Under bandwidth pressure, tells
                                  livekit-client what to sacrifice.
                                  For interactive video, maintain-framerate
                                  is usually best (frozen sharp pictures
                                  feel worse than blurry motion).

Same plumbing pattern as the existing livekitPublishSimulcast /
livekitPublishMaxBitrateKbps / livekitAdaptiveStream / livekitDynacast
options:
  - LiveKitCallbacks (transports/livekit.ts)
  - WebRTCConfig (webrtc-manager.ts)
  - zod schema (client.ts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants