Skip to content

YouTube video transcription API with Go, yt-dlp, FFmpeg, and whisper.cpp. Features async job processing, live dashboard, and Encore framework integration.

Notifications You must be signed in to change notification settings

wilmoore/OmniTranscripts

Repository files navigation

logo

Go Version License GitHub stars GitHub issues Docker

OmniTranscripts is a self-hostable transcription engine that turns any local or remote audio/video into clean, timestamped transcripts via a Go library or HTTP API. OmniTranscripts exists because most transcription tools are either SaaS-only, YouTube-only, or not designed to fit real automation workflows.

Supported inputs

  • Local audio/video files (.mp4, .mp3, .wav)
  • Multipart file uploads
  • Direct media URLs
  • YouTube (including Shorts)
  • TikTok videos
  • Instagram Reels (public)
  • 1000+ additional platforms via yt-dlp

Features

Built in Go. Powered by FFmpeg and Whisper, with a single, deterministic pipeline. Designed for production pipelines.

  • Multi-Source Ingestion: Local files, file uploads, direct URLs, and 1000+ platforms
  • Single Pipeline: Same FFmpeg → Whisper flow regardless of source
  • Go Library + HTTP API: Embed or deploy
  • Sync + Async Processing: Short jobs return immediately, long jobs queue
  • Multiple Outputs: TXT, SRT, VTT, JSON, TSV
  • Production-Oriented: Size limits, validation, structured errors, webhooks
  • Self-Hostable: Docker, Encore.dev, any cloud

Documentation

Quick Start

Get up and running in minutes.

Read guide

API Docs

Complete reference with request & response examples.

Explore

Architecture

Deep dive into the system design and patterns.

Understand

Deployment

Production-ready deployment playbooks.

Deploy

Development

Local setup, workflows, and contributor tooling.

Build

Troubleshooting

Quick fixes for common pitfalls and errors.

Fix

Contributing

Guidelines for issues, pull requests, and reviews.

Join

Changelog

Track version history and notable updates.

Review

Quick Start

Docker (Recommended)

# Run with Docker (with local media mount)
docker run -d \
  --name omnitranscripts \
  -p 3000:3000 \
  -e API_KEY=your-api-key-here \
  -v $(pwd)/media:/media \
  wilmoore/omnitranscripts:latest

# Transcribe a URL
curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer your-api-key-here" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/shorts/VIDEO_ID"}'

# Upload a local file
curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer your-api-key-here" \
  -F "file=@./media/video.mp4"

Local Development

# 1. Install dependencies
# macOS
brew install go ffmpeg
pip install openai-whisper

# Ubuntu/Debian
sudo apt install golang-go ffmpeg python3-pip
pip install openai-whisper

# 2. Clone and run
git clone https://github.com/wilmoore/omnitranscripts.git
cd omnitranscripts
cp .env.example .env  # Edit with your settings
make dev

Encore.dev (Production)

# Deploy to production in one command
curl -L https://encore.dev/install.sh | bash
encore deploy --env production

Usage

As a Go Library

import "omnitranscripts/engine"

// Local file
result, err := engine.Transcribe(
    "/path/to/recording.mp4",
    "job-local-001",
    engine.DefaultOptions(),
)

// URL (YouTube, Vimeo, SoundCloud, direct media URLs, etc.)
result, err := engine.Transcribe(
    "https://example.com/audio.mp3",
    "job-url-002",
    engine.DefaultOptions(),
)

// Handle errors
if err != nil {
    var tErr *engine.TranscriptionError
    if errors.As(err, &tErr) {
        fmt.Printf("Failed at stage %s: %s\n", tErr.Stage, tErr.Message)
    }
    return err
}

fmt.Println(result.Transcript)
for _, seg := range result.Segments {
    fmt.Printf("[%0.1fs - %0.1fs] %s\n", seg.Start, seg.End, seg.Text)
}

Local files bypass the download stage and go directly through FFmpeg → Whisper.

As an HTTP API

POST /transcribe

Transcribe media from any supported URL.

Request:

{
  "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}

Response (Short Media):

{
  "transcript": "Never gonna give you up, never gonna let you down...",
  "segments": [
    {
      "start": 0.0,
      "end": 3.5,
      "text": "Never gonna give you up"
    }
  ]
}

Response (Long Media):

{
  "job_id": "123e4567-e89b-12d3-a456-426614174000"
}

GET /transcribe/{job_id}

Get the status and result of a transcription job.

Response:

{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "complete",
  "transcript": "Never gonna give you up...",
  "segments": [...],
  "created_at": "2024-01-01T12:00:00Z",
  "completed_at": "2024-01-01T12:02:30Z"
}

Possible statuses: pending, running, complete, error

GET /health

Health check endpoint (no authentication required).

Complete API documentation: docs/api.md

Usage Examples

For real-world usage patterns (short-form video, Docker, async jobs), see the examples/ directory.

cURL

# Transcribe from URL
curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'

# Transcribe a local file (file upload)
curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@/path/to/recording.mp4"

# Transcribe a podcast URL
curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/podcast-episode.mp3"}'

# Check job status
curl -X GET http://localhost:3000/transcribe/YOUR_JOB_ID \
  -H "Authorization: Bearer YOUR_API_KEY"

JavaScript

const response = await fetch('http://localhost:3000/transcribe', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'
  })
});

const result = await response.json();

Short-Form Video

OmniTranscripts works with short-form video platforms out of the box.

YouTube Shorts

curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/shorts/VIDEO_ID"}'

TikTok

curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.tiktok.com/@username/video/VIDEO_ID"}'

Instagram Reels (public)

curl -X POST http://localhost:3000/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.instagram.com/reel/REEL_ID/"}'

Note: Only public content is supported. Private or authentication-gated content requires additional configuration not covered here.

More Examples

See the examples/ directory for real-world usage:

  • local-files/ - Local file transcription (Go library + file upload)
  • short-form/ - YouTube Shorts, TikTok, Instagram Reels
  • docker/ - Docker Compose and container workflows
  • production/ - Async job polling and webhooks

Full API reference: docs/api.md

Platform Limitations

These are upstream platform constraints, not OmniTranscripts-specific limitations.

Limitation Details
Private content Friends-only Instagram Reels, private TikToks, unlisted YouTube videos requiring auth
Authentication No built-in support for authenticated sessions (cookies, login)
Region locks Some content is geographically restricted
Rate limits Platforms may throttle requests; add delays for batch processing
Platform changes yt-dlp extractors may break when platforms update; keep yt-dlp updated

For most public content, OmniTranscripts works reliably. Edge cases should be tested before production use.

Development Setup

Prerequisites

  • Go 1.23+ - Install Go
  • FFmpeg - brew install ffmpeg (macOS) or sudo apt install ffmpeg (Linux)
  • OpenAI Whisper - pip install openai-whisper

Quick Setup

# Clone and setup
git clone https://github.com/wilmoore/omnitranscripts.git
cd omnitranscripts
cp .env.example .env  # Edit with your settings
make dev

Detailed development guide: docs/development.md

Make Commands

make help          # Show all available commands
make dev           # Development server with hot reload
make test          # Run all tests
make build         # Build for current platform
make check         # Run quality checks (fmt + lint + vet + test)

Complete command reference: docs/development.md#using-the-makefile

Production Deployment

Docker (Recommended)

Docker is the recommended way to run OmniTranscripts in production.

# Build and run
docker build -t omnitranscripts .
docker run -d \
  -p 3000:3000 \
  --env-file .env \
  -v $(pwd)/media:/media \
  omnitranscripts

# Or use Docker Compose (see examples/docker/)
docker-compose -f examples/docker/docker-compose.yml up -d

Encore.dev (Zero-Config)

encore deploy --env production

Cloud Platforms

  • AWS ECS/Fargate - Container-based deployment
  • Google Cloud Run - Serverless containers
  • Azure Container Instances - Managed containers
  • Kubernetes - Full orchestration

Complete deployment guides: docs/deployment.md

Architecture

Three-Stage Pipeline:

  1. Download - Extract audio from any URL (yt-dlp, supports 1000+ platforms)
  2. Normalize - Convert to 16kHz mono WAV (FFmpeg)
  3. Transcribe - Generate timestamped transcripts (whisper.cpp)

Smart Processing:

  • Media <=2min: Synchronous (immediate results)
  • Media >2min: Asynchronous (job queue with status tracking)

Dual Consumption Model:

  • engine/ package: Direct Go library usage
  • HTTP API: Thin adapter over the engine

Detailed architecture: docs/architecture.md

API Documentation

Full OpenAPI/Swagger documentation available at docs/swagger.yaml.

Troubleshooting

Common issues and solutions: docs/troubleshooting.md

Contributing

We welcome contributions! Please see our contributing guide for details on:

  • Setting up your development environment
  • Coding standards and best practices
  • Submitting issues and pull requests
  • Community guidelines

License

This project is licensed under the MIT License - see the LICENSE file for details.


Star this repo if it's helpful!

Report Bug | Request Feature | Discussions

Built with care by wilmoore

About

YouTube video transcription API with Go, yt-dlp, FFmpeg, and whisper.cpp. Features async job processing, live dashboard, and Encore framework integration.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •