Skip to content

FEAT: Realtime streaming session support and server-side barge-in attack#1766

Open
adrian-gavrila wants to merge 25 commits into
microsoft:mainfrom
adrian-gavrila:adrian-gavrila/realtime-server-vad
Open

FEAT: Realtime streaming session support and server-side barge-in attack#1766
adrian-gavrila wants to merge 25 commits into
microsoft:mainfrom
adrian-gavrila:adrian-gavrila/realtime-server-vad

Conversation

@adrian-gavrila
Copy link
Copy Markdown
Contributor

Description

Adds persistent streaming session support to OpenAIRealtimeTarget and introduces BargeInAttack, a streaming attack that leverages server-side VAD to detect and exploit barge-in (interruption) behavior. Previously the target only supported single-turn fire-and-forget audio exchanges; this PR adds the transport primitives needed for multi-turn streaming sessions with incremental audio push, event subscription, and mid-session response requests.

When the server detects new user speech while the assistant is still responding, the in-flight response is automatically interrupted and the conversation history is truncated to match what was actually delivered.

Key additions:

  • OpenAIRealtimeTarget streaming primitivesconnect_async, push_audio_chunk_async, insert_user_audio_async, subscribe_events_async, request_response_async, send_streaming_session_config_async. These expose transport-level operations over a persistent WebSocket connection.
  • _RealtimeEventDispatcher — ABC that owns a realtime connection's event stream, routes provider-specific events to the active turn, and fires an on_user_audio_committed callback when server VAD finalizes a turn. Provider-specific routing is isolated to _route_event / _cancel abstract methods.
  • BargeInAttack — streaming attack that pushes audio chunks into a persistent session, applies configured converters on each server-committed turn (convert-on-commit), requests responses, and tracks interruptions. Per-turn Message pairs are persisted to CentralMemory with prompt_metadata["interrupted"] = True on interrupted turns.
  • ServerVadConfig / RealtimeTargetResult — shared types for configuring server VAD and representing turn results (audio, transcripts, interruption flag).
  • PromptNormalizer.convert_audio_async — applies audio converter configurations to raw PCM bytes for streaming attacks that hold audio mid-turn rather than a Message.

The target exposes only transport primitives; all attack logic (buffering, convert-on-commit dance, interruption signaling) lives in BargeInAttack.

Tests and Documentation

  • 82 unit tests across 3 test files covering: event dispatch and routing, turn lifecycle, interruption detection, converter application, error paths, multi-turn connection reuse, and the full attack lifecycle.
  • Coverage: 98% on realtime_audio.py, 72% on openai_realtime_target.py (uncovered lines are pre-existing code paths, not new additions).
  • Notebook: doc/code/executor/attack/barge_in_attack.py (jupytext py:percent format) demonstrates the attack against a live OpenAI Realtime API endpoint with server VAD. Ran successfully against gpt-4o-realtime-preview — outputs cleared for CI (requires live credentials).

adrian-gavrila and others added 16 commits May 14, 2026 13:07
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t API

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ardown

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nc rename, Optional→union

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…me-server-vad

# Conflicts:
#	pyrit/prompt_target/openai/openai_realtime_target.py
@hannahwestra25 hannahwestra25 self-assigned this May 21, 2026
Comment thread pyrit/executor/attack/streaming/barge_in.py
Comment thread pyrit/executor/attack/streaming/barge_in.py Outdated
Comment thread pyrit/prompt_normalizer/prompt_normalizer.py Outdated
Comment thread pyrit/executor/attack/streaming/barge_in.py
Comment thread doc/code/executor/attack/barge_in_attack.py Outdated
Comment thread pyrit/executor/attack/streaming/barge_in.py Outdated
Comment thread doc/code/executor/attack/barge_in_attack.py
adrian-gavrila and others added 6 commits May 22, 2026 12:36
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…imitive

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… inline drive_response

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread doc/code/executor/attack/barge_in_attack.py Outdated
Comment thread pyrit/executor/attack/streaming/barge_in.py Outdated
raw_buffer: bytearray = field(default_factory=bytearray)
turn_lock: asyncio.Lock = field(default_factory=asyncio.Lock)
last_assistant_message: Message | None = None
executed_turns: int = 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think should be last_assistant_message and executed_turns be in the context instead so the state only contains variables related to the async functionality

#: Maximum time to wait after the chunk source exhausts for any in-flight VAD-committed
#: turn to finish (commit → convert → response.create → response.done → persist). Acts as
#: a safety cap; the attack returns as soon as the last turn actually completes.
_MAX_POST_STREAM_WAIT_SECONDS = 30.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it valuable to have this be configurable?


logger = logging.getLogger(__name__)

_REALTIME_SAMPLE_RATE_HZ = 24000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is specific to OAI realtime target so we should probably change it to make it configurable

@hannahwestra25
Copy link
Copy Markdown
Contributor

I'm a little concerned with how this attack deviates from the other attacks in that it doesn't use the send_prompt_async workflow. The attack is doing a lot of plumbing that other attacks delegate to send_async — audio normalization (bypasses the pipeline), the swap + response trigger, message construction, memory persistence. That makes it inconsistent with the standard "build prompt → send_prompt_async → response" workflow and tightly couples it to RealtimeTarget internals.

Could we push the streaming session work into RealtimeTarget.send_async so the attack just does:

Connect + subscribe
Push chunks
On VAD commit → build a SeedPromptGroup from the snapshot and call send_prompt_async (target handles normalize + swap + response + persist)
Cleanup
Streaming-specific behavior still lives somewhere — but it belongs in the target that owns the WebSocket, not leaked into every streaming attack. This would also let AudioStreamNormalizer live inside the target as an implementation detail rather than something the attack reaches across the boundary to use.

adrian-gavrila and others added 3 commits May 26, 2026 18:02
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants