A high-performance, production-ready text-to-speech server built with Piper-TTS, providing OpenAI-compatible API endpoints for GraphDone applications.
- OpenAI-Compatible API: Drop-in replacement for OpenAI's TTS endpoints
- High Performance: Built with FastAPI and optimized for speed
- Multiple Voices: Support for 6+ voices with multiple quality levels
- Web Interface: Interactive UI for testing and configuration
- Smart Caching: Intelligent LRU+LFU cache system with 10GB limit
- Docker Ready: Single or multi-container deployment options
- Format Support: MP3, WAV, OPUS, AAC, FLAC, PCM output formats
- Rate Limiting: Built-in protection against abuse
- Batch Processing: Generate multiple voices in parallel
GraphDone-TTS/
βββ src/ # Source code
β βββ piper-server/ # FastAPI TTS server
β βββ webui/ # Flask web interface
βββ docker/ # Docker configurations
β βββ Dockerfile.single # Single container build
β βββ docker-compose.*.yml # Compose configurations
βββ scripts/ # Automation scripts
β βββ setup_tts.sh # Basic setup
β βββ setup_tts_with_ui.sh # Full setup with UI
β βββ download_voices.sh # Voice model downloader
βββ config/ # Configuration files
β βββ voice_to_speaker.yaml
β βββ pre_process_map.yaml
βββ voices/ # ONNX voice models
βββ tests/ # Test files and scripts
βββ docs/ # Documentation
βββ examples/ # Usage examples
# Clone the repository
git clone https://github.com/graphdone/GraphDone-TTS.git
cd GraphDone-TTS
# Just run start - it handles everything automatically!
./startThat's it! The ./start script will:
- β Install Docker if needed
- β Download voice models
- β Build containers
- β Start all services
- β Show you the URLs
./start # Start everything (auto-setup)
./start stop # Stop all services
./start restart # Restart services
./start logs # View logs
./start status # Check status
./start clean # Clean up everythingcurl -X POST "http://localhost:8000/v1/audio/speech" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello, this is GraphDone TTS!",
"voice": "nova",
"response_format": "mp3"
}' \
--output speech.mp3alloy- Neutral, professionalecho- Warm, conversationalfable- Expressive, narrativeonyx- Deep, authoritativenova- Energetic, youthfulshimmer- Gentle, soothing
mp3- Default, compressed audiowav- Uncompressed, high qualityopus- Efficient compressionaac- Apple-compatibleflac- Lossless compressionpcm- Raw audio data
Access the web UI at http://localhost:3000
Features:
- Test different voices and settings
- Batch generation for multiple voices
- Real-time audio playback
- Cache management dashboard
- API endpoint testing
# Build complete image with all voices
./scripts/package_single.sh
# Run the container
docker run -d -p 8000:8000 -p 3000:3000 \
--name graphdone-tts \
tts-server-complete:latest# Development (multi-container)
docker-compose -f docker/docker-compose.yml up
# Production (single container)
docker-compose -f docker/docker-compose.single.yml upEdit config/voice_to_speaker.yaml to customize voice mappings:
nova:
low: en_US-amy-low
medium: en_US-amy-medium
high: en_US-amy-medium
x_low: en_US-amy-lowCustomize text processing in config/pre_process_map.yaml:
abbreviations:
"Mr.": "Mister"
"Dr.": "Doctor"# Cache settings
CACHE_DIR=/app/output/cache
MAX_CACHE_SIZE_GB=10
# API configuration
TTS_API_URL=http://localhost:8000
SECRET_KEY=your-secret-key
# Performance
MAX_WORKERS=8Run the comprehensive test suite:
# Run all tests
./start test
# Test specific component
./tests/test_tts.sh
# Manual API test
curl http://localhost:8000/health- Response Time: < 500ms for cached content
- Generation Speed: 2-5 seconds for new content
- Concurrent Requests: Handles 100+ simultaneous requests
- Cache Hit Rate: 70%+ in production
- Memory Usage: < 2GB under normal load
- Input validation and sanitization
- Rate limiting on all endpoints
- Path traversal protection
- Non-root container execution
- Secure file handling
Contributions are welcome! Please read our contributing guidelines before submitting PRs.
This project is licensed under the MIT License - see LICENSE file for details.
- Piper-TTS: MIT License - github.com/rhasspy/piper
- OpenAI API Specification: MIT License
- Built with Piper-TTS - A fast, local neural text-to-speech system (MIT License)
- Piper-TTS models and voice synthesis technology by Michael Hansen
- OpenAI API compatibility for seamless integration
- GraphDone team for project support
For issues and questions:
- GitHub Issues: GraphDone-TTS/issues
- Documentation: docs/
Made with β€οΈ by the GraphDone Team