Skip to content

backblaze-b2-samples/awesome-video-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Awesome Video Generation Awesome PRs Welcome License: CC0-1.0

A curated list of AI video generation APIs, SDKs, and production-ready tools. Focused on services developers can integrate today.

Last verified: March 2026

Related Lists


Contents


Text-to-Video APIs

  • OpenAI Sora – Text-to-video and image-to-video via the v1/videos endpoint. Sora 2 supports up to 90s at 4K with spatial audio. API Docs | SDK: Python, Node
  • Runway (Gen-4) – Text-to-video and image-to-video with Gen-4 Turbo. Async task-based REST API with polling helpers. Docs | SDK: Python, Node
  • Google Veo 2 / Veo 3 – Google's video generation models via Vertex AI and Gemini API. Veo 2 is GA; Veo 3 in paid preview. Docs | SDK: Google Cloud Python
  • Pika (v2.2) – Text-to-video and image-to-video with Pikaframes multi-keyframe interpolation. API powered by fal.ai. Docs | SDK: fal Python, fal JS
  • Luma Dream Machine – High-quality text-to-video with character reference and style reference inputs. Ray 3 is the latest model. Docs | SDK: Python, JS
  • Kling AI – Text-to-video and image-to-video from Kuaishou. Up to 30s clips at 1080p/30fps. Async task-based API. Docs | Also on fal.ai
  • MiniMax / Hailuo – Hailuo 2.3 model. Text-to-video and image-to-video up to 1080p, 10s clips. Docs | SDK: Python, Node
  • Vidu (Shengshu Technology) – Now on Vidu Q3, the first long-form AI video model with native audio-video generation in a single output. Ranked #2 globally on Artificial Analysis benchmarks.
  • xAI Aurora / Grok Imagine – Text-to-video and image-to-video using xAI's Aurora autoregressive MoE model. 6–15s clips at 720p with synchronized audio. API
  • Seedance 2.0 (ByteDance) – Dual-Branch Diffusion Transformer for simultaneous video + audio generation. Up to 15s at 2K resolution. Available via Dreamina.
  • Stability AI (SVD) – Image-to-video via Stable Video Diffusion. Hosted API deprecated July 2025; open weights available for self-hosting. GitHub
  • Higgsfield – Cinematic video platform aggregating 15+ premium models (Sora 2, Kling 2.6, Veo 3.1, etc.) with camera simulation, character consistency, and lip-sync. 15M+ users.
  • Magic Hour – Multi-modal AI video generation API. Text-to-video, image-to-video, style transfer, 4K upscaling. Scales to zero when idle. Docs | API
  • InVideo AI – Turns text prompts into full videos using Sora 2 and Veo 3.1 as underlying models. OpenAI's first official Sora 2 integration partner. 50M+ users.
  • Fliki – Text-to-video and text-to-speech platform. Enterprise API with 2,500+ voices in 80+ languages. Docs
  • Morph Studio – No-code AI video studio aggregating Wan 2.6, Kling 2.6 Pro, Seedance, Sora 2, Veo 3 into a single canvas with storyboarding and style transfer.

Real-Time and Interactive Video

  • Decart (Lucy 2) – Real-time video transformation at 30fps 1080p with near-zero latency. Live-stream style transfer, character swaps, environment transformation, product placement. ~$3/hour. Docs | Platform
  • PixVerse – Text-to-video and image-to-video platform. PixVerse-R1 adds real-time interactive video at 720p HD with native audio. Platform | Docs

Video Style Transfer and Motion

  • DomoAI – AI video-to-video style remixer. 50+ styles (anime, Ghibli, cinematic). v2.4.1 supports text-to-video, image-to-video, talking avatars, and animation.
  • Viggle AI – Motion-transfer video tool that animates static characters to match a motion video or live webcam input. Mix Mode, Live Mode, and VTubing support. 40M+ users.

Avatar and Talking Head APIs

  • Synthesia – Avatar-based video creation from scripts. 140+ languages, custom avatars, template workflows. API in beta. Docs
  • D-ID – Talking head video generation from text or audio. Express and Premium+ avatars, real-time WebRTC streaming. Docs | SDK: Python
  • HeyGen – AI avatar video generation and real-time streaming avatars via WebRTC. Template-based workflows. Docs | SDK: JS/TS
  • Tavus – Conversational video AI. Phoenix-4 model does real-time gaussian-diffusion facial synthesis at ~600ms latency. Replica API clones face + voice. Integrates with Pipecat and LiveKit. Avatar API | Video API
  • Colossyan – Enterprise avatar video platform. 130+ avatars, 600+ voices, 100+ languages, instant avatar creation from phone recording. API
  • DeepBrain AI (AI Studios) – AI avatar video platform with REST API. Integrates AWS, Azure, ElevenLabs, IBM Watson, NVIDIA Riva. Docs
  • Hour One – AI avatar video generator with 100+ presenters, voice cloning, 100+ languages. API + Zapier integration.
  • Elai.io – AI video platform with streaming avatar API for interactive e-learning. Turns documents and scripts into avatar-presented videos. Docs
  • Captions / Mirage – Mirage API generates hyperrealistic talking-head videos from script + image + actor ID. Natural gestures, eye contact, synchronized audio. API | Docs
  • SynthLife – Virtual AI influencer creation. Creates AI personas for TikTok, YouTube, Instagram with auto-scheduling and unlimited content generation.

Video Enhancement APIs

  • Topaz Video AI – AI video upscaling, denoising, frame interpolation, and artifact correction. 3M+ users; used by Google, Tesla, NASA. API | Docs

Video Understanding APIs

  • Twelve Labs – Video understanding and intelligence API. Multimodal semantic search, analysis, and text generation from video content. Docs

Open Source Models

Models with hosted inference, Docker support, or diffusers integration.

  • Wan 2.1 (Alibaba) – SOTA open T2V-14B model. Also supports I2V, editing, T2I, and V2A. T2V-1.3B runs on consumer GPUs. Apache 2.0. HuggingFace | On Replicate, fal.ai
  • Wan 2.2 (Alibaba) – World's first open-source MoE video diffusion model. +65.6% more image training data and +83.2% more video data vs 2.1. 5B and 14B variants. Apache 2.0. HuggingFace
  • MAGI-1 (Sand AI) – 24B param autoregressive denoising model. Generates video chunk-by-chunk (24 frames/chunk). Supports T2V, I2V, V2V with streaming generation. Outperforms Wan 2.1 and HunyuanVideo on benchmarks. Apache 2.0. HuggingFace | Paper
  • Step-Video-T2V (StepFun) – 300B parameter text-to-video model, up to 204 frames, bilingual (EN/ZH). MIT license. TI2V | HuggingFace
  • SkyReels (SkyworkAI) – V1: human-centric video fine-tuned on HunyuanVideo. V2: infinite-length video via Autoregressive Diffusion-Forcing. V3: multimodal reaching closed-source SOTA levels. V1 | V3
  • HunyuanVideo (Tencent) – 13B+ param model; v1.5 is 8.3B and runs on consumer GPUs. I2V, Avatar, and Foley variants available. v1.5 | On Replicate, fal.ai
  • CogVideoX (Zhipu AI / Z.AI) – CogVideoX-5B flagship; supports 10s videos. Commercial product "Ying" available via API. Apache 2.0 (2B). HuggingFace | API
  • NVIDIA Cosmos – World foundation model for physical AI (robotics, autonomous vehicles). Cosmos-Predict2.5 generates physics-based video simulations from text/image/video/sensor inputs. Website | Docs
  • Meta Movie Gen – 30B param T2V + 13B audio model. Personalized video from single reference photo, local/global editing, synchronized audio. Research paper public; weights not yet released. Rolling out inside Instagram Reels.
  • LTX-Video / LTX-2 (Lightricks) – First DiT-based real-time video gen model. LTX-2 adds native 4K at 50fps with synchronized audio. LTX-2 | ComfyUI Nodes
  • Pyramid Flow – Efficient autoregressive video generation using pyramidal flow matching. Up to 10s at 768p, 24fps. ICLR 2025. HuggingFace | Paper
  • Open-Sora – Open reproduction of Sora-like generation. 2s–15s at 144p–720p. T2V, I2V, V2V. Apache 2.0.
  • AnimateDiff – Plug-and-play animation module for Stable Diffusion models. Merged into HuggingFace diffusers. Diffusers Docs | On Replicate
  • Mochi 1 (Genmo) – 10B param T2V model with AsymmDiT architecture. 5.4s at 30fps. Apache 2.0. On fal.ai, Replicate
  • Allegro (Rhymes AI) – 2.8B param VideoDiT. 6s clips at 720p/15fps. Merged into diffusers. Apache 2.0. HuggingFace
  • OmniHuman-1 (ByteDance) – Multimodal human video generation from single image + motion signal (audio, video, or text). Full-body, any aspect ratio. On Replicate

SDKs and Developer Tooling

  • Replicate SDK – Python/JS client for 100+ hosted video models. Async, streaming, webhooks, fine-tuning. Docs | pip install replicate
  • fal.ai SDK – Serverless AI inference with Python, JS, and Swift SDKs. Hosts Kling, Veo, Pika, Wan, LTX, Luma, and more. Docs | pip install fal-client / npm install @fal-ai/client
  • Runway SDK – Official Python and Node.js SDKs with type annotations, async support, and built-in polling. pip install runwayml / npm install @runwayml/sdk
  • Luma AI SDK – Sync and async clients for all Dream Machine generation modes. JS Docs | pip install lumaai
  • HeyGen Streaming Avatar SDK – TypeScript SDK for real-time WebRTC interactive avatar sessions. npm install @heygen/streaming-avatar
  • MiniMax MCP Server – Model Context Protocol servers for video gen, TTS, and voice cloning. JS
  • HuggingFace Diffusers – The canonical PyTorch library for diffusion models including video pipelines. Docs | pip install diffusers

Infrastructure and Deployment

GPU Cloud and Inference Platforms

  • Lambda Labs – On-demand H100/B200 GPUs. SSH and JupyterLab access with REST API for instance management.
  • CoreWeave – Kubernetes-native AI cloud with enterprise-scale GPU infrastructure.
  • RunPod – GPU pods (persistent) and serverless endpoints. REST, GraphQL, and CLI. Docs
  • Modal – Serverless Python-first GPU platform. Container spin-up in ~1 second. Docs
  • Together AI – Inference API for 200+ open models plus Instant Clusters for self-service GPU clusters.
  • Replicate – Serverless model hosting. Run open-source video models via REST API. Docs
  • fal.ai – Serverless inference for generative media. 600+ models. Python, JS, Swift SDKs. Docs
  • WaveSpeedAI – Fast AI inference with no cold starts. 600+ models. 30–50% cheaper than HuggingFace Inference. 99.9% uptime SLA. GitHub
  • Pollo AI – Video API aggregator providing access to Kling, Veo 3.1, Runway, Hailuo, Wan 2.6, and Pollo 2.0. Docs | API

Video Processing

  • FFmpeg – Industry-standard multimedia processing. Encode, decode, transcode, stream, filter. GitHub
  • HandBrake – Open-source video transcoder wrapping FFmpeg. GUI and CLI. GitHub

Media Storage and Delivery

  • Mux – API-first video infrastructure. Upload, encode, stream (VOD + live), analytics. SDKs for Node, Python, Ruby, Go, and more. Docs
  • Cloudflare Stream – Video upload, encoding, and CDN delivery billed per minute watched. Live streaming via RTMP/SRT. Docs
  • Backblaze B2 – S3-compatible object storage at ~$0.006/GB/month. Free egress via Cloudflare. Docs

Video Playback

  • hls.js – JavaScript HLS playback via MSE. Used by major streaming platforms.
  • Shaka Player – Google's open-source DASH + HLS player.

Evaluation and Observability

  • VBench / VBench-2.0 – Comprehensive benchmark for video generative models. 16 fine-grained dimensions including subject consistency, motion smoothness, temporal flickering. VBench-2.0 adds Physics and Commonsense evaluation. Leaderboard | Paper (CVPR 2024)

Templates and Example Projects

Learning Resources


Contributing

Contributions welcome! Please read the contribution guidelines first. PRs for new tools, corrections, and updates are appreciated.

License

CC0

About

A curated list of AI video generation APIs, SDKs, and tools including text-to-video, video editing, multimodal generation, diffusion models, and generative AI platforms. Covers commercial services, open source models with APIs, and production-ready infrastructure for developers building video applications.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors