Kokoro TTS Local Web UI

A local, feature-rich Gradio interface for the Kokoro open-weight Text-to-Speech model. This application provides a user-friendly web UI to generate high-quality speech, featuring parallel processing for long texts, advanced text cleaning, and automatic hardware acceleration.

✨ Features

High-Quality TTS: Access to all Kokoro voices (US & UK accents).
Parallel Processing: Splits long text into chunks and processes them in parallel threads for significantly faster generation.
Hardware Acceleration: Automatically detects and uses NVIDIA GPU (CUDA) if available, falling back to CPU seamlessly.
Text Preprocessing:
- Reference number removal (e.g., [1]).
- Whitespace normalization.
- Initial formatting (e.g., converting "J.R.R." to "J R R").
Tokenization Preview: View the phonemes/tokens generated by the model before audio synthesis.
Sample Library: Quick access to sample texts (Great Gatsby, Frankenstein) or random quotes.

🛠️ Prerequisites

Before running the application, ensure you have the following installed:

Python 3.8+

eSpeak-ng: Required for phonemization.

Windows:

Download and install from eSpeak-ng releases.

🧩 PowerShell Commands (Run as Administrator) Copy and paste these commands one by one in PowerShell after installing eSpeak NG:

$env:PHONEMIZER_ESPEAK_LIBRARY = "c:\Program Files\eSpeak NG\libespeak-ng.dll"
$env:PHONEMIZER_ESPEAK_PATH = "c:\Program Files\eSpeak NG"
setx PHONEMIZER_ESPEAK_LIBRARY "c:\Program Files\eSpeak NG\libespeak-ng.dll"
setx PHONEMIZER_ESPEAK_PATH "c:\Program Files\eSpeak NG"

Linux: sudo apt-get install espeak-ng
Mac: brew install espeak

📦 Installation

Note: You do not need to clone the GitHub repository. You only need the app.py script.

Create a Folder: Manually create a new folder (e.g., named kokoro) on your computer.
Download Script: Download app.py and place it inside this folder.
Set up a Virtual Environment: Open your terminal or command prompt inside this folder and run:
- Windows:
```
python -m venv venv
venv\Scripts\activate
```
- Linux/Mac:
```
python3 -m venv venv
source venv/bin/activate
```
Install Dependencies: Install the required packages, including the Kokoro library:
```
pip install gradio torch nltk phonemizer scipy soundfile kokoro>=0.9.4
```
(Note: If you have a specific CUDA version, install the appropriate version of PyTorch from pytorch.org.)

🚀 Usage

Ensure your virtual environment is activated.
Run the application:
```
python app.py
```
The application will automatically launch in your default web browser.

⚙️ Configuration & Controls

Main Interface

Voice: Select from a variety of US (Heart, Bella, Michael, etc.) and UK (Emma, George, Lewis) voices.
Speed: Adjust the speaking rate (0.5x to 2.0x).
Text Cleaning: Toggle specific preprocessing steps in the "Text Cleaning Options" accordion.

Performance Tuning

Parallel Processing: Located in the accordion settings.
- Slider (1-10): Controls how many text chunks are processed simultaneously.
- Tip: Higher values use more RAM/VRAM. If you encounter "Out of Memory" errors, reduce this slider.

Troubleshooting

NLTK Errors: The app attempts to download necessary NLTK data (punkt) automatically. If this fails, run import nltk; nltk.download('punkt') in a Python shell.
eSpeak Errors: If you see errors related to EspeakWrapper, ensure espeak-ng is installed and added to your system's PATH. The app includes a monkey-patch to help locate it in standard environments.

📂 Project Structure

├── app.py                 # Main application file
├── kokoro-v0_19.pth       # Model weights (Downloaded automatically on first run)
├── venv/                  # Virtual environment folder (created during installation)
├── en.txt                 # (Optional) Source for random quotes
├── gatsby5k.md            # (Optional) Sample text
└── frankenstein5k.md      # (Optional) Sample text

📜 License

This project relies on the Kokoro TTS model. Please refer to the original model's license for usage restrictions.

Acknowledgements

🛠️ @yl4579 for architecting StyleTTS 2.
🏆 @Pendrokar for adding Kokoro as a contender in the TTS Spaces Arena.
📊 Thank you to everyone who contributed synthetic training data.
❤️ Special thanks to all compute sponsors.
👾 Discord server: https://discord.gg/QuGxSWBfQy
🪽 Kokoro is a Japanese word that translates to "heart" or "spirit". Kokoro is also a character in the Terminator franchise along with Misaki.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.github		.github
demo		demo
examples		examples
kokoro.js		kokoro.js
kokoro		kokoro
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
en.txt		en.txt
frankenstein5k.md		frankenstein5k.md
gatsby5k.md		gatsby5k.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kokoro TTS Local Web UI

✨ Features

🛠️ Prerequisites

📦 Installation

🚀 Usage

⚙️ Configuration & Controls

Main Interface

Performance Tuning

Troubleshooting

📂 Project Structure

📜 License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

enoky/KokoroTTS-GUI-Extended

Folders and files

Latest commit

History

Repository files navigation

Kokoro TTS Local Web UI

✨ Features

🛠️ Prerequisites

📦 Installation

🚀 Usage

⚙️ Configuration & Controls

Main Interface

Performance Tuning

Troubleshooting

📂 Project Structure

📜 License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages