Awesome Video Generation

A curated list of AI video generation APIs, SDKs, and production-ready tools. Focused on services developers can integrate today.

Last verified: March 2026

Related Lists

Text-to-Video APIs
Real-Time and Interactive Video
Video Style Transfer and Motion
Avatar and Talking Head APIs
Video Enhancement APIs
Video Understanding APIs
Open Source Models
SDKs and Developer Tooling
Infrastructure and Deployment
Evaluation and Observability
Templates and Example Projects
Learning Resources

Text-to-Video APIs

OpenAI Sora – Text-to-video and image-to-video via the v1/videos endpoint. Sora 2 supports up to 90s at 4K with spatial audio. API Docs | SDK: Python, Node
Runway (Gen-4) – Text-to-video and image-to-video with Gen-4 Turbo. Async task-based REST API with polling helpers. Docs | SDK: Python, Node
Google Veo 2 / Veo 3 – Google's video generation models via Vertex AI and Gemini API. Veo 2 is GA; Veo 3 in paid preview. Docs | SDK: Google Cloud Python
Pika (v2.2) – Text-to-video and image-to-video with Pikaframes multi-keyframe interpolation. API powered by fal.ai. Docs | SDK: fal Python, fal JS
Luma Dream Machine – High-quality text-to-video with character reference and style reference inputs. Ray 3 is the latest model. Docs | SDK: Python, JS
Kling AI – Text-to-video and image-to-video from Kuaishou. Up to 30s clips at 1080p/30fps. Async task-based API. Docs | Also on fal.ai
MiniMax / Hailuo – Hailuo 2.3 model. Text-to-video and image-to-video up to 1080p, 10s clips. Docs | SDK: Python, Node
Vidu (Shengshu Technology) – Now on Vidu Q3, the first long-form AI video model with native audio-video generation in a single output. Ranked #2 globally on Artificial Analysis benchmarks.
xAI Aurora / Grok Imagine – Text-to-video and image-to-video using xAI's Aurora autoregressive MoE model. 6–15s clips at 720p with synchronized audio. API
Seedance 2.0 (ByteDance) – Dual-Branch Diffusion Transformer for simultaneous video + audio generation. Up to 15s at 2K resolution. Available via Dreamina.
Stability AI (SVD) – Image-to-video via Stable Video Diffusion. Hosted API deprecated July 2025; open weights available for self-hosting. GitHub
Higgsfield – Cinematic video platform aggregating 15+ premium models (Sora 2, Kling 2.6, Veo 3.1, etc.) with camera simulation, character consistency, and lip-sync. 15M+ users.
Magic Hour – Multi-modal AI video generation API. Text-to-video, image-to-video, style transfer, 4K upscaling. Scales to zero when idle. Docs | API
InVideo AI – Turns text prompts into full videos using Sora 2 and Veo 3.1 as underlying models. OpenAI's first official Sora 2 integration partner. 50M+ users.
Fliki – Text-to-video and text-to-speech platform. Enterprise API with 2,500+ voices in 80+ languages. Docs
Morph Studio – No-code AI video studio aggregating Wan 2.6, Kling 2.6 Pro, Seedance, Sora 2, Veo 3 into a single canvas with storyboarding and style transfer.

Real-Time and Interactive Video

Decart (Lucy 2) – Real-time video transformation at 30fps 1080p with near-zero latency. Live-stream style transfer, character swaps, environment transformation, product placement. ~$3/hour. Docs | Platform
PixVerse – Text-to-video and image-to-video platform. PixVerse-R1 adds real-time interactive video at 720p HD with native audio. Platform | Docs

Video Style Transfer and Motion

DomoAI – AI video-to-video style remixer. 50+ styles (anime, Ghibli, cinematic). v2.4.1 supports text-to-video, image-to-video, talking avatars, and animation.
Viggle AI – Motion-transfer video tool that animates static characters to match a motion video or live webcam input. Mix Mode, Live Mode, and VTubing support. 40M+ users.

Avatar and Talking Head APIs

Synthesia – Avatar-based video creation from scripts. 140+ languages, custom avatars, template workflows. API in beta. Docs
D-ID – Talking head video generation from text or audio. Express and Premium+ avatars, real-time WebRTC streaming. Docs | SDK: Python
HeyGen – AI avatar video generation and real-time streaming avatars via WebRTC. Template-based workflows. Docs | SDK: JS/TS
Tavus – Conversational video AI. Phoenix-4 model does real-time gaussian-diffusion facial synthesis at ~600ms latency. Replica API clones face + voice. Integrates with Pipecat and LiveKit. Avatar API | Video API
Colossyan – Enterprise avatar video platform. 130+ avatars, 600+ voices, 100+ languages, instant avatar creation from phone recording. API
DeepBrain AI (AI Studios) – AI avatar video platform with REST API. Integrates AWS, Azure, ElevenLabs, IBM Watson, NVIDIA Riva. Docs
Hour One – AI avatar video generator with 100+ presenters, voice cloning, 100+ languages. API + Zapier integration.
Elai.io – AI video platform with streaming avatar API for interactive e-learning. Turns documents and scripts into avatar-presented videos. Docs
Captions / Mirage – Mirage API generates hyperrealistic talking-head videos from script + image + actor ID. Natural gestures, eye contact, synchronized audio. API | Docs
SynthLife – Virtual AI influencer creation. Creates AI personas for TikTok, YouTube, Instagram with auto-scheduling and unlimited content generation.

Video Enhancement APIs

Topaz Video AI – AI video upscaling, denoising, frame interpolation, and artifact correction. 3M+ users; used by Google, Tesla, NASA. API | Docs

Video Understanding APIs

Twelve Labs – Video understanding and intelligence API. Multimodal semantic search, analysis, and text generation from video content. Docs

Open Source Models

Models with hosted inference, Docker support, or diffusers integration.

Wan 2.1 (Alibaba) – SOTA open T2V-14B model. Also supports I2V, editing, T2I, and V2A. T2V-1.3B runs on consumer GPUs. Apache 2.0. HuggingFace | On Replicate, fal.ai
Wan 2.2 (Alibaba) – World's first open-source MoE video diffusion model. +65.6% more image training data and +83.2% more video data vs 2.1. 5B and 14B variants. Apache 2.0. HuggingFace
MAGI-1 (Sand AI) – 24B param autoregressive denoising model. Generates video chunk-by-chunk (24 frames/chunk). Supports T2V, I2V, V2V with streaming generation. Outperforms Wan 2.1 and HunyuanVideo on benchmarks. Apache 2.0. HuggingFace | Paper
Step-Video-T2V (StepFun) – 300B parameter text-to-video model, up to 204 frames, bilingual (EN/ZH). MIT license. TI2V | HuggingFace
SkyReels (SkyworkAI) – V1: human-centric video fine-tuned on HunyuanVideo. V2: infinite-length video via Autoregressive Diffusion-Forcing. V3: multimodal reaching closed-source SOTA levels. V1 | V3
HunyuanVideo (Tencent) – 13B+ param model; v1.5 is 8.3B and runs on consumer GPUs. I2V, Avatar, and Foley variants available. v1.5 | On Replicate, fal.ai
CogVideoX (Zhipu AI / Z.AI) – CogVideoX-5B flagship; supports 10s videos. Commercial product "Ying" available via API. Apache 2.0 (2B). HuggingFace | API
NVIDIA Cosmos – World foundation model for physical AI (robotics, autonomous vehicles). Cosmos-Predict2.5 generates physics-based video simulations from text/image/video/sensor inputs. Website | Docs
Meta Movie Gen – 30B param T2V + 13B audio model. Personalized video from single reference photo, local/global editing, synchronized audio. Research paper public; weights not yet released. Rolling out inside Instagram Reels.
LTX-Video / LTX-2 (Lightricks) – First DiT-based real-time video gen model. LTX-2 adds native 4K at 50fps with synchronized audio. LTX-2 | ComfyUI Nodes
Pyramid Flow – Efficient autoregressive video generation using pyramidal flow matching. Up to 10s at 768p, 24fps. ICLR 2025. HuggingFace | Paper
Open-Sora – Open reproduction of Sora-like generation. 2s–15s at 144p–720p. T2V, I2V, V2V. Apache 2.0.
AnimateDiff – Plug-and-play animation module for Stable Diffusion models. Merged into HuggingFace diffusers. Diffusers Docs | On Replicate
Mochi 1 (Genmo) – 10B param T2V model with AsymmDiT architecture. 5.4s at 30fps. Apache 2.0. On fal.ai, Replicate
Allegro (Rhymes AI) – 2.8B param VideoDiT. 6s clips at 720p/15fps. Merged into diffusers. Apache 2.0. HuggingFace
OmniHuman-1 (ByteDance) – Multimodal human video generation from single image + motion signal (audio, video, or text). Full-body, any aspect ratio. On Replicate

SDKs and Developer Tooling

Replicate SDK – Python/JS client for 100+ hosted video models. Async, streaming, webhooks, fine-tuning. Docs | pip install replicate
fal.ai SDK – Serverless AI inference with Python, JS, and Swift SDKs. Hosts Kling, Veo, Pika, Wan, LTX, Luma, and more. Docs | pip install fal-client / npm install @fal-ai/client
Runway SDK – Official Python and Node.js SDKs with type annotations, async support, and built-in polling. pip install runwayml / npm install @runwayml/sdk
Luma AI SDK – Sync and async clients for all Dream Machine generation modes. JS Docs | pip install lumaai
HeyGen Streaming Avatar SDK – TypeScript SDK for real-time WebRTC interactive avatar sessions. npm install @heygen/streaming-avatar
MiniMax MCP Server – Model Context Protocol servers for video gen, TTS, and voice cloning. JS
HuggingFace Diffusers – The canonical PyTorch library for diffusion models including video pipelines. Docs | pip install diffusers

Infrastructure and Deployment

GPU Cloud and Inference Platforms

Lambda Labs – On-demand H100/B200 GPUs. SSH and JupyterLab access with REST API for instance management.
CoreWeave – Kubernetes-native AI cloud with enterprise-scale GPU infrastructure.
RunPod – GPU pods (persistent) and serverless endpoints. REST, GraphQL, and CLI. Docs
Modal – Serverless Python-first GPU platform. Container spin-up in ~1 second. Docs
Together AI – Inference API for 200+ open models plus Instant Clusters for self-service GPU clusters.
Replicate – Serverless model hosting. Run open-source video models via REST API. Docs
fal.ai – Serverless inference for generative media. 600+ models. Python, JS, Swift SDKs. Docs
WaveSpeedAI – Fast AI inference with no cold starts. 600+ models. 30–50% cheaper than HuggingFace Inference. 99.9% uptime SLA. GitHub
Pollo AI – Video API aggregator providing access to Kling, Veo 3.1, Runway, Hailuo, Wan 2.6, and Pollo 2.0. Docs | API

Video Processing

FFmpeg – Industry-standard multimedia processing. Encode, decode, transcode, stream, filter. GitHub
HandBrake – Open-source video transcoder wrapping FFmpeg. GUI and CLI. GitHub

Media Storage and Delivery

Mux – API-first video infrastructure. Upload, encode, stream (VOD + live), analytics. SDKs for Node, Python, Ruby, Go, and more. Docs
Cloudflare Stream – Video upload, encoding, and CDN delivery billed per minute watched. Live streaming via RTMP/SRT. Docs
Backblaze B2 – S3-compatible object storage at ~$0.006/GB/month. Free egress via Cloudflare. Docs

Video Playback

hls.js – JavaScript HLS playback via MSE. Used by major streaming platforms.
Shaka Player – Google's open-source DASH + HLS player.

Evaluation and Observability

VBench / VBench-2.0 – Comprehensive benchmark for video generative models. 16 fine-grained dimensions including subject consistency, motion smoothness, temporal flickering. VBench-2.0 adds Physics and Commonsense evaluation. Leaderboard | Paper (CVPR 2024)

Templates and Example Projects

fal.ai Next.js Video Generator – Official Next.js template with queue management and TypeScript. One-click Vercel deploy. Vercel Template
HeyGen Streaming Avatar Demo – Next.js/TypeScript starter for real-time WebRTC avatar sessions.
Stability AI SVD Streamlit Demo – Streamlit demo scripts for Stable Video Diffusion. (see scripts/demo/)
B2 Video Object Detection with Transformers – Video object detection pipeline using HuggingFace Transformers with Backblaze B2 cloud storage integration.
Google Gemini Streamlit + Cloud Run – Sample app using Gemini multimodal with Streamlit, deployable to Cloud Run.

Learning Resources

Contributing

Contributions welcome! Please read the contribution guidelines first. PRs for new tools, corrections, and updates are appreciated.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Video Generation

Related Lists

Contents

Text-to-Video APIs

Real-Time and Interactive Video

Video Style Transfer and Motion

Avatar and Talking Head APIs

Video Enhancement APIs

Video Understanding APIs

Open Source Models

SDKs and Developer Tooling

Infrastructure and Deployment

GPU Cloud and Inference Platforms

Video Processing

Media Storage and Delivery

Video Playback

Evaluation and Observability

Templates and Example Projects

Learning Resources

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome Video Generation

Related Lists

Contents

Text-to-Video APIs

Real-Time and Interactive Video

Video Style Transfer and Motion

Avatar and Talking Head APIs

Video Enhancement APIs

Video Understanding APIs

Open Source Models

SDKs and Developer Tooling

Infrastructure and Deployment

GPU Cloud and Inference Platforms

Video Processing

Media Storage and Delivery

Video Playback

Evaluation and Observability

Templates and Example Projects

Learning Resources

Contributing

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages