Powered by state-of-the-art models such as SparkTTS, OrpheusTTS, MegaTTS 3, FlashTTS delivers high-quality Mandarin speech synthesis and zero-shot voice cloning. With a clean and intuitive Web interface, you can quickly generate natural, lifelike voices for dubbing, narration, accessibility, virtual characters, and more.
If you find FlashTTS helpful, please leave us a ⭐ Star!
| Feature | Description | |
|---|---|---|
| 🚀 | Multi-backend Acceleration | Supports high-performance inference engines like vllm, sglang, llama-cpp, mlx-lm,tensorrt-llm, etc. |
| 🎯 | High Concurrency | Dynamic batching and asynchronous queues to handle heavy traffic with ease |
| 🎛️ | Full Parameter Control | Adjust pitch, speaking rate, temperature, emotion tags, and more |
| 📱 | Lightweight Deployment | Built on FastAPI—start with a single command; minimal dependencies |
| 🔊 | Long-form Synthesis | Supports very long texts while maintaining consistent voice quality |
| 🔄 | Streaming TTS | Generate and play audio in real time; reduces wait time, enhances interactivity |
| 🎭 | Multi-character Dialog | Synthesize multiple roles within the same text—ideal for script dubbing |
| 🎨 | Modern Frontend | Web-ready, responsive interface |
FastTTS.mp4
Below are demos showcasing FlashTTS’s cloning capabilities across different models and characters.
|
Donald Trump (EN) |
Donald Trump (ZH) |
|
Nezha |
Li Jing |
|
Yu Chengdong |
Xu Zhisheng |
|
Cai Xukun |
Taiyi Zhenren |
|
Changle |
Baizhi |
It is recommended to install flashtts in a Python 3.8–3.12 environment via pip:
pip install flashttsFor detailed installation steps, please refer to: installation guide
Local inference command::
flashtts infer \
-i "hello world." \
-o output.wav \
-m ./models/your_model \
-b vllm \
[other optional parameters]For detailed usage,please refer to: quick_start.md
Server deployment:
flashtts serve \
--model_path Spark-TTS-0.5B \
--backend vllm \
--role_dir data/roles \
--llm_device cuda \
--tokenizer_device cuda \
--detokenizer_device cuda \
--wav2vec_attn_implementation sdpa \
--llm_attn_implementation sdpa \
--torch_dtype "bfloat16" \
--max_length 32768 \
--llm_gpu_memory_utilization 0.6 \
--fix_voice \ # Whether to fix the spark-tts timbre (female and male)
--host 0.0.0.0 \
--port 8000Web address: http://localhost:8000
Interface document address: http://localhost:8000/docs
For detailed deployment,please refer to: server.md
Test environment: A800 GPU · Model: Spark-TTS-0.5B · Test script: speed_test.py
| Scenario | Engine | Device | Audio Length (s) | Inference Time (s) | RTF |
|---|---|---|---|---|---|
| Short | llama-cpp | CPU | 7.48 | 6.81 | 0.91 |
| Short | torch | GPU | 7.18 | 7.68 | 1.07 |
| Short | vllm | GPU | 7.24 | 1.66 | 0.23 |
| Short | sglang | GPU | 7.58 | 1.07 | 0.14 |
| Long | llama-cpp | CPU | 121.98 | 117.83 | 0.97 |
| Long | torch | GPU | 113.70 | 107.17 | 0.94 |
| Long | vllm | GPU | 111.82 | 7.28 | 0.07 |
| Long | sglang | GPU | 117.02 | 4.20 | 0.04 |
RTF < 1 means real-time synthesis.
- SparkTTS weights must be
bfloat16orfloat32; usingfloat16will cause errors. - If you experience long silent gaps, try increasing
repetition_penalty(> 1.0). - OrpheusTTS supports inserting
<tag>in text to control emotion. SeeLANG_MAPinorpheus_engine.py. - For safety reasons, MegaTTS 3 does not publish the WaveVAE encoder. Please follow the official instructions to download it: reference audio.
FlashTTS is provided for academic research, education, and lawful purposes only, such as accessibility assistance and personalized speech synthesis. Do not use it for fraud, impersonation, deepfakes, or other illegal activities. Users are responsible for any misuse.
This project follows the same license as Spark-TTS. See LICENSE for details.