OmniTranscripts is a self-hostable transcription engine that turns any local or remote audio/video into clean, timestamped transcripts via a Go library or HTTP API. OmniTranscripts exists because most transcription tools are either SaaS-only, YouTube-only, or not designed to fit real automation workflows.
- Local audio/video files (
.mp4,.mp3,.wav) - Multipart file uploads
- Direct media URLs
- YouTube (including Shorts)
- TikTok videos
- Instagram Reels (public)
- 1000+ additional platforms via yt-dlp
Built in Go. Powered by FFmpeg and Whisper, with a single, deterministic pipeline. Designed for production pipelines.
- Multi-Source Ingestion: Local files, file uploads, direct URLs, and 1000+ platforms
- Single Pipeline: Same FFmpeg → Whisper flow regardless of source
- Go Library + HTTP API: Embed or deploy
- Sync + Async Processing: Short jobs return immediately, long jobs queue
- Multiple Outputs: TXT, SRT, VTT, JSON, TSV
- Production-Oriented: Size limits, validation, structured errors, webhooks
- Self-Hostable: Docker, Encore.dev, any cloud
|
Get up and running in minutes. Read guide |
Complete reference with request & response examples. Explore |
Deep dive into the system design and patterns. Understand |
Production-ready deployment playbooks. Deploy |
|
Local setup, workflows, and contributor tooling. Build |
Quick fixes for common pitfalls and errors. Fix |
Guidelines for issues, pull requests, and reviews. Join |
Track version history and notable updates. Review |
# Run with Docker (with local media mount)
docker run -d \
--name omnitranscripts \
-p 3000:3000 \
-e API_KEY=your-api-key-here \
-v $(pwd)/media:/media \
wilmoore/omnitranscripts:latest
# Transcribe a URL
curl -X POST http://localhost:3000/transcribe \
-H "Authorization: Bearer your-api-key-here" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.youtube.com/shorts/VIDEO_ID"}'
# Upload a local file
curl -X POST http://localhost:3000/transcribe \
-H "Authorization: Bearer your-api-key-here" \
-F "file=@./media/video.mp4"# 1. Install dependencies
# macOS
brew install go ffmpeg
pip install openai-whisper
# Ubuntu/Debian
sudo apt install golang-go ffmpeg python3-pip
pip install openai-whisper
# 2. Clone and run
git clone https://github.com/wilmoore/omnitranscripts.git
cd omnitranscripts
cp .env.example .env # Edit with your settings
make dev# Deploy to production in one command
curl -L https://encore.dev/install.sh | bash
encore deploy --env productionimport "omnitranscripts/engine"
// Local file
result, err := engine.Transcribe(
"/path/to/recording.mp4",
"job-local-001",
engine.DefaultOptions(),
)
// URL (YouTube, Vimeo, SoundCloud, direct media URLs, etc.)
result, err := engine.Transcribe(
"https://example.com/audio.mp3",
"job-url-002",
engine.DefaultOptions(),
)
// Handle errors
if err != nil {
var tErr *engine.TranscriptionError
if errors.As(err, &tErr) {
fmt.Printf("Failed at stage %s: %s\n", tErr.Stage, tErr.Message)
}
return err
}
fmt.Println(result.Transcript)
for _, seg := range result.Segments {
fmt.Printf("[%0.1fs - %0.1fs] %s\n", seg.Start, seg.End, seg.Text)
}Local files bypass the download stage and go directly through FFmpeg → Whisper.
Transcribe media from any supported URL.
Request:
{
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}Response (Short Media):
{
"transcript": "Never gonna give you up, never gonna let you down...",
"segments": [
{
"start": 0.0,
"end": 3.5,
"text": "Never gonna give you up"
}
]
}Response (Long Media):
{
"job_id": "123e4567-e89b-12d3-a456-426614174000"
}Get the status and result of a transcription job.
Response:
{
"id": "123e4567-e89b-12d3-a456-426614174000",
"status": "complete",
"transcript": "Never gonna give you up...",
"segments": [...],
"created_at": "2024-01-01T12:00:00Z",
"completed_at": "2024-01-01T12:02:30Z"
}Possible statuses: pending, running, complete, error
Health check endpoint (no authentication required).
Complete API documentation: docs/api.md
For real-world usage patterns (short-form video, Docker, async jobs), see the examples/ directory.
# Transcribe from URL
curl -X POST http://localhost:3000/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'
# Transcribe a local file (file upload)
curl -X POST http://localhost:3000/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@/path/to/recording.mp4"
# Transcribe a podcast URL
curl -X POST http://localhost:3000/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/podcast-episode.mp3"}'
# Check job status
curl -X GET http://localhost:3000/transcribe/YOUR_JOB_ID \
-H "Authorization: Bearer YOUR_API_KEY"const response = await fetch('http://localhost:3000/transcribe', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'
})
});
const result = await response.json();OmniTranscripts works with short-form video platforms out of the box.
YouTube Shorts
curl -X POST http://localhost:3000/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.youtube.com/shorts/VIDEO_ID"}'TikTok
curl -X POST http://localhost:3000/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.tiktok.com/@username/video/VIDEO_ID"}'Instagram Reels (public)
curl -X POST http://localhost:3000/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.instagram.com/reel/REEL_ID/"}'Note: Only public content is supported. Private or authentication-gated content requires additional configuration not covered here.
See the examples/ directory for real-world usage:
- local-files/ - Local file transcription (Go library + file upload)
- short-form/ - YouTube Shorts, TikTok, Instagram Reels
- docker/ - Docker Compose and container workflows
- production/ - Async job polling and webhooks
Full API reference: docs/api.md
These are upstream platform constraints, not OmniTranscripts-specific limitations.
| Limitation | Details |
|---|---|
| Private content | Friends-only Instagram Reels, private TikToks, unlisted YouTube videos requiring auth |
| Authentication | No built-in support for authenticated sessions (cookies, login) |
| Region locks | Some content is geographically restricted |
| Rate limits | Platforms may throttle requests; add delays for batch processing |
| Platform changes | yt-dlp extractors may break when platforms update; keep yt-dlp updated |
For most public content, OmniTranscripts works reliably. Edge cases should be tested before production use.
- Go 1.23+ - Install Go
- FFmpeg -
brew install ffmpeg(macOS) orsudo apt install ffmpeg(Linux) - OpenAI Whisper -
pip install openai-whisper
# Clone and setup
git clone https://github.com/wilmoore/omnitranscripts.git
cd omnitranscripts
cp .env.example .env # Edit with your settings
make devDetailed development guide: docs/development.md
make help # Show all available commands
make dev # Development server with hot reload
make test # Run all tests
make build # Build for current platform
make check # Run quality checks (fmt + lint + vet + test)Complete command reference: docs/development.md#using-the-makefile
Docker is the recommended way to run OmniTranscripts in production.
# Build and run
docker build -t omnitranscripts .
docker run -d \
-p 3000:3000 \
--env-file .env \
-v $(pwd)/media:/media \
omnitranscripts
# Or use Docker Compose (see examples/docker/)
docker-compose -f examples/docker/docker-compose.yml up -dencore deploy --env production- AWS ECS/Fargate - Container-based deployment
- Google Cloud Run - Serverless containers
- Azure Container Instances - Managed containers
- Kubernetes - Full orchestration
Complete deployment guides: docs/deployment.md
Three-Stage Pipeline:
- Download - Extract audio from any URL (yt-dlp, supports 1000+ platforms)
- Normalize - Convert to 16kHz mono WAV (FFmpeg)
- Transcribe - Generate timestamped transcripts (whisper.cpp)
Smart Processing:
- Media <=2min: Synchronous (immediate results)
- Media >2min: Asynchronous (job queue with status tracking)
Dual Consumption Model:
engine/package: Direct Go library usage- HTTP API: Thin adapter over the engine
Detailed architecture: docs/architecture.md
Full OpenAPI/Swagger documentation available at docs/swagger.yaml.
Common issues and solutions: docs/troubleshooting.md
We welcome contributions! Please see our contributing guide for details on:
- Setting up your development environment
- Coding standards and best practices
- Submitting issues and pull requests
- Community guidelines
This project is licensed under the MIT License - see the LICENSE file for details.
Star this repo if it's helpful!
Report Bug | Request Feature | Discussions
Built with care by wilmoore
