pipecat version
No response
Python version
No response
Operating System
No response
Question
hey everyone,
i am building an analytics layer on top of pipecat voice bot and trying to figure out the cleanest approach to capture:
- full transcript — user utterances and agent responses, each with a start_ms / end_ms relative to the call start
- tool calls — which function was called, with what args, what it returned, and when
- per-turn latencies — things like stt latency, llm-ttfb, tts synthesis time, and e2e turn latency (user stops speaking → bot starts speaking)
What I've tried
i first tried adding a custom Observer that oversees the whole pipeline, but I didn’t make much progress and it became very complicated. then implemented a custom ProcessorFrame that intercepts other frames to capture the generated llm transcription and so on, but I ran into many issues and it got complicated very quickly — especially with race conditions between frames and similar problems.
Context
i am looking for any recommendations for examples, patterns, or existing integrations
that would be really appreciated..