Skip to content

Configure realtime audio buffer frame / polling rate? #17

@LumineBot

Description

@LumineBot

Hello, first of all, thank you author for your amazing work for this library (and also whisper.net too!)

I have a question regarding the realtime transcription. So far I can use echo sharp to trainscribe audio in real time by catching RealtimeSegmentRecognizing and RealtimeSegmentRecognized transcription types.

But there's slight problem: I can observe that RealtimeSegmentRecognizing is less accurate, which sounds reasonable and obvious to me since the whole recognizing process is not completed, but my assumption is that it is because the audio buffer is too small / short. I feel like it tries to recognize somewhere around 50ms audio buffer where not even a quarter of word spoken.

Of course this is only based on rough observation, but the result is that displaying text of the latest of these events does not look that pretty. I'm also using SileroVadDetector but I'm not familiar with the inner working of the detection.

I feel like this could be better if I can somehow throttle the processing rate, or in other words make audio bit bigger. There would be more delay overhead but at least it will came out nicer. I could put Task.Delay between my TranscribeAsync call but I don't think it is the right way to do it.

Any advice would be highly appreciated. Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions