Skip to content

Kokoro TTS Gradio GUI for Long Text generation (runs locally)

License

Notifications You must be signed in to change notification settings

enoky/KokoroTTS-GUI-Extended

 
 

Repository files navigation

Kokoro TTS Local Web UI

A local, feature-rich Gradio interface for the Kokoro open-weight Text-to-Speech model. This application provides a user-friendly web UI to generate high-quality speech, featuring parallel processing for long texts, advanced text cleaning, and automatic hardware acceleration.

image

✨ Features

  • High-Quality TTS: Access to all Kokoro voices (US & UK accents).
  • Parallel Processing: Splits long text into chunks and processes them in parallel threads for significantly faster generation.
  • Hardware Acceleration: Automatically detects and uses NVIDIA GPU (CUDA) if available, falling back to CPU seamlessly.
  • Text Preprocessing:
    • Reference number removal (e.g., [1]).
    • Whitespace normalization.
    • Initial formatting (e.g., converting "J.R.R." to "J R R").
  • Tokenization Preview: View the phonemes/tokens generated by the model before audio synthesis.
  • Sample Library: Quick access to sample texts (Great Gatsby, Frankenstein) or random quotes.

🛠️ Prerequisites

Before running the application, ensure you have the following installed:

  1. Python 3.8+
  2. eSpeak-ng: Required for phonemization.
    • Windows:
      1. Download and install from eSpeak-ng releases.
      2. 🧩 PowerShell Commands (Run as Administrator) Copy and paste these commands one by one in PowerShell after installing eSpeak NG:
        $env:PHONEMIZER_ESPEAK_LIBRARY = "c:\Program Files\eSpeak NG\libespeak-ng.dll"
        $env:PHONEMIZER_ESPEAK_PATH = "c:\Program Files\eSpeak NG"
        setx PHONEMIZER_ESPEAK_LIBRARY "c:\Program Files\eSpeak NG\libespeak-ng.dll"
        setx PHONEMIZER_ESPEAK_PATH "c:\Program Files\eSpeak NG"
    • Linux: sudo apt-get install espeak-ng
    • Mac: brew install espeak

📦 Installation

Note: You do not need to clone the GitHub repository. You only need the app.py script.

  1. Create a Folder: Manually create a new folder (e.g., named kokoro) on your computer.

  2. Download Script: Download app.py and place it inside this folder.

  3. Set up a Virtual Environment: Open your terminal or command prompt inside this folder and run:

    • Windows:
      python -m venv venv
      venv\Scripts\activate
    • Linux/Mac:
      python3 -m venv venv
      source venv/bin/activate
  4. Install Dependencies: Install the required packages, including the Kokoro library:

    pip install gradio torch nltk phonemizer scipy soundfile kokoro>=0.9.4

    (Note: If you have a specific CUDA version, install the appropriate version of PyTorch from pytorch.org.)

🚀 Usage

  1. Ensure your virtual environment is activated.
  2. Run the application:
    python app.py
  3. The application will automatically launch in your default web browser.

⚙️ Configuration & Controls

Main Interface

  • Voice: Select from a variety of US (Heart, Bella, Michael, etc.) and UK (Emma, George, Lewis) voices.
  • Speed: Adjust the speaking rate (0.5x to 2.0x).
  • Text Cleaning: Toggle specific preprocessing steps in the "Text Cleaning Options" accordion.

Performance Tuning

  • Parallel Processing: Located in the accordion settings.
    • Slider (1-10): Controls how many text chunks are processed simultaneously.
    • Tip: Higher values use more RAM/VRAM. If you encounter "Out of Memory" errors, reduce this slider.

Troubleshooting

  • NLTK Errors: The app attempts to download necessary NLTK data (punkt) automatically. If this fails, run import nltk; nltk.download('punkt') in a Python shell.
  • eSpeak Errors: If you see errors related to EspeakWrapper, ensure espeak-ng is installed and added to your system's PATH. The app includes a monkey-patch to help locate it in standard environments.

📂 Project Structure

├── app.py                 # Main application file
├── kokoro-v0_19.pth       # Model weights (Downloaded automatically on first run)
├── venv/                  # Virtual environment folder (created during installation)
├── en.txt                 # (Optional) Source for random quotes
├── gatsby5k.md            # (Optional) Sample text
└── frankenstein5k.md      # (Optional) Sample text

📜 License

This project relies on the Kokoro TTS model. Please refer to the original model's license for usage restrictions.

Acknowledgements

kokoro

About

Kokoro TTS Gradio GUI for Long Text generation (runs locally)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 52.8%
  • JavaScript 45.9%
  • CSS 1.1%
  • HTML 0.2%