Generates SRT subtitles from MP3/WAV audio using Whisper or Faster-Whisper backends. Optimized for developer use.
- Backends & Models:
--backend whisper(OpenAI) orfaster-whisper. Supports various model sizes (e.g.,tinytolarge-v3). - Performance:
--device cpuorcuda.--testmode for 3-min audio clips (auto-creates if non-existent). - Output Quality: Segments split to sentences, proportional timing, 80-char/2-line SRT formatting.
- Efficiency: Caches models, skips redundant test clip creation.
faster-whispershows duration-based progress.
- Prerequisites: Python 3.x, FFmpeg (in PATH or update
main.py). - Environment:
python -m venv venv && source venv/bin/activate(orvenv\Scripts\activateon Windows). - Dependencies:
pip install -r requirements.txt.
- Audio files (MP3/WAV) in
input/. python main.py [OPTIONS]- Subtitles in
output/.
| Argument | Choices | Default | Description |
|---|---|---|---|
--test |
Use 3-min test clip. | ||
--backend |
whisper, faster-whisper |
whisper |
Transcription backend. |
--model |
(backend-dependent) | large |
Model size (e.g., tiny, large, large-v3). |
--device |
cpu, cuda |
cpu |
Execution device. |
Refer to backend docs for specific model availability.
- For full dependency list, see
requirements.txt. - Handles Whisper model checksum errors by offering cache clearing.
- Ensure CUDA environment is correctly configured if using
--device cuda.