-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Description
InterruptibleTTSService._handle_interruption() only disconnects/reconnects the WebSocket when _bot_speaking=True. However, interruption can occur before BotStartedSpeakingFrame is emitted — when TTS audio is still being synthesized or in transit to the output transport. In this case, _bot_speaking is False, the WebSocket is not disconnected, and the TTS server continues sending audio that gets played after the interruption.
This is a regression from the fix in #950 / PR #1272, which was later generalized into InterruptibleTTSService. The _bot_speaking guard was added as an optimization but introduces a race condition.
Root Cause
In tts_service.py lines 1429-1433:
async def _handle_interruption(self, frame: InterruptionFrame, direction: FrameDirection):
await super()._handle_interruption(frame, direction)
if self._bot_speaking: # <-- This guard is the problem
await self._disconnect()
await self._connect()_bot_speaking is set to True only when BotStartedSpeakingFrame is received (line 1444), which happens when audio reaches the output transport and starts playing. If the user interrupts before that point, the guard fails and the WebSocket stays connected.
Affected Services
All TTS services inheriting from InterruptibleTTSService that don't override on_audio_context_interrupted():
- Fish Audio (
fish/tts.py) - LMNT (
lmnt/tts.py) - Neuphonic (
neuphonic/tts.py) - Sarvam (
sarvam/tts.py)
Services NOT Affected
These services correctly override on_audio_context_interrupted() to cancel server-side synthesis regardless of _bot_speaking state:
- ElevenLabs — sends
close_contextmessage via WebSocket - Rime — sends clear message to server
- Deepgram — sends
{"type": "Clear"}to server
Steps to Reproduce
- Configure a pipeline with any affected TTS service (e.g., Fish Audio) and
allow_interruptions=True - Send a prompt that triggers a multi-sentence LLM response
- Speak (trigger
UserStartedSpeakingFrame) before the first sentence's TTS audio starts playing (beforeBotStartedSpeakingFrame) - Observe that
InterruptionFramefires, but the TTS audio still arrives and plays afterward
Log Evidence
49.216 [OUTPUT] UserStartedSpeakingFrame DOWNSTREAM ← User interrupts
49.237 [OUTPUT] InterruptionFrame DOWNSTREAM ← Interruption fires
49.239 on_assistant_turn_stopped ← LLM response complete
49.419 [OUTPUT] BotStartedSpeakingFrame DOWNSTREAM ← Audio plays AFTER interruption (bug)
Note: BotStartedSpeakingFrame (49.419) occurs after InterruptionFrame (49.237), meaning _bot_speaking was False when _handle_interruption ran.
Suggested Fix
Option A: Remove the _bot_speaking guard — always disconnect/reconnect on interruption:
async def _handle_interruption(self, frame: InterruptionFrame, direction: FrameDirection):
await super()._handle_interruption(frame, direction)
await self._disconnect()
await self._connect()Option B: Override on_audio_context_interrupted() in each affected service (similar to ElevenLabs/Rime/Deepgram) to cancel server-side synthesis. This is called from super()._handle_interruption() regardless of _bot_speaking.
Environment
- Pipecat version: 0.0.dev8273
- TTS: Fish Audio (
FishAudioTTSService) - Transport: Twilio Media Streams (WebSocket) via Daily
- Python: 3.12