TurnAnalyzerUserTurnStopStrategy ignores InterimTranscriptionFrame, causing deadlock when STT delays finalized transcriptions

## Summary

`TurnAnalyzerUserTurnStopStrategy` only processes `TranscriptionFrame` (finalized) to set `_text`, but ignores `InterimTranscriptionFrame`. When an STT service delays finalized transcriptions for short utterances until more speech arrives, the stop strategy can never trigger — `_turn_complete` gets set by VAD stop, but `_text` stays empty.

This causes a deadlock where the bot goes silent for as long as the user stays quiet.

## Reproduction scenario

1. User says "I need help with my order" → turn completes → LLM triggers
2. User says "Billing." (short word) while LLM is generating → `InterimTranscriptionFrame("Billing")` arrives
3. `TranscriptionUserTurnStartStrategy` fires (from the interim) → new turn starts, LLM interrupted
4. VAD stop fires ~0.4s later → `TurnAnalyzerUserTurnStopStrategy` sets `_turn_complete = True`
5. But `_text` is still `""` because `_handle_transcription` only processes `TranscriptionFrame`:

```python
# turn_analyzer_user_turn_stop_strategy.py line 106
elif isinstance(frame, TranscriptionFrame):
    await self._handle_transcription(frame)
```

6. `_maybe_trigger_user_turn_stopped()` returns early:

```python
# line 213
if not self._text or not self._turn_complete:
    return
```

7. The STT holds the finalized transcription for "Billing." until the next sentence arrives — which could be 15+ seconds later
8. Bot sits silent until the user speaks again

## Root cause

`InterimTranscriptionFrame` and `TranscriptionFrame` are separate classes (both inherit from `TextFrame`), so `isinstance(frame, TranscriptionFrame)` doesn't match interims. The stop strategy has no way to populate `_text` from interim transcriptions.

## STT behavior context

Some STT services don't finalize short utterances immediately. A single word like "Billing." may only produce an `InterimTranscriptionFrame`, with the finalized `TranscriptionFrame` arriving only when the user speaks again (e.g., "Billing. Hello?" as a single finalized transcription 15+ seconds later).

This may also be an STT configuration issue on our side — any guidance on expected STT finalization behavior would be helpful.

## Suggested fix

Have `TurnAnalyzerUserTurnStopStrategy.process_frame` also handle `InterimTranscriptionFrame` to set `_text`, similar to how it handles `TranscriptionFrame`:

```python
elif isinstance(frame, InterimTranscriptionFrame):
    text = frame.text.strip()
    if text:
        self._text = text
        await self._maybe_trigger_user_turn_stopped()
        # Fallback path for no-VAD-stop scenario
        if not self._vad_user_speaking and self._vad_stopped_time is None:
            self._turn_complete = True
            ...
```

## Workaround

We currently work around this by setting `TranscriptionUserTurnStartStrategy(enable_interruptions=False)` and increasing VAD `stop_secs` to give users more time to finish multi-word answers within a single turn.

## Environment

- pipecat 0.0.104
- Turn analyzer: `LocalSmartTurnAnalyzerV3` with `SmartTurnParams(stop_secs=2)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TurnAnalyzerUserTurnStopStrategy ignores InterimTranscriptionFrame, causing deadlock when STT delays finalized transcriptions #3988

Summary

Reproduction scenario

Root cause

STT behavior context

Suggested fix

Workaround

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TurnAnalyzerUserTurnStopStrategy ignores InterimTranscriptionFrame, causing deadlock when STT delays finalized transcriptions #3988

Description

Summary

Reproduction scenario

Root cause

STT behavior context

Suggested fix

Workaround

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions