Skip to content

What's the recommended way to capture full call transcripts + per-turn latencies from a pipecat pipeline? #3977

@HamzaFouad

Description

@HamzaFouad

pipecat version

No response

Python version

No response

Operating System

No response

Question

hey everyone,

i am building an analytics layer on top of pipecat voice bot and trying to figure out the cleanest approach to capture:

  1. full transcript — user utterances and agent responses, each with a start_ms / end_ms relative to the call start
  2. tool calls — which function was called, with what args, what it returned, and when
  3. per-turn latencies — things like stt latency, llm-ttfb, tts synthesis time, and e2e turn latency (user stops speaking → bot starts speaking)

What I've tried

i first tried adding a custom Observer that oversees the whole pipeline, but I didn’t make much progress and it became very complicated. then implemented a custom ProcessorFrame that intercepts other frames to capture the generated llm transcription and so on, but I ran into many issues and it got complicated very quickly — especially with race conditions between frames and similar problems.

Context

i am looking for any recommendations for examples, patterns, or existing integrations
that would be really appreciated..

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions