Configure realtime audio buffer frame / polling rate?

Hello, first of all, thank you author for your amazing work for this library (and also whisper.net too!)

I have a question regarding the realtime transcription. So far I can use echo sharp to trainscribe audio in real time by catching `RealtimeSegmentRecognizing` and `RealtimeSegmentRecognized` transcription types. 

But there's slight problem: I can observe that `RealtimeSegmentRecognizing` is less accurate, which sounds reasonable and obvious to me since the whole recognizing process is not completed, but my assumption is that it is because the audio buffer is too small / short. I feel like it tries to recognize somewhere around 50ms audio buffer where not even a quarter of word spoken.

Of course this is only based on rough observation, but the result is that displaying text of the latest of these events does not look that pretty. I'm also using `SileroVadDetector` but I'm not familiar with the inner working of the detection.

I feel like this could be better if I can somehow throttle the processing rate, or in other words make audio bit bigger. There would be more delay overhead but at least it will came out nicer. I could put `Task.Delay` between my `TranscribeAsync` call but I don't think it is the right way to do it.

Any advice would be highly appreciated. Thank you in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure realtime audio buffer frame / polling rate? #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Configure realtime audio buffer frame / polling rate? #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions