Edge TTS Module for Wippy

Text-to-Speech module using Microsoft Edge Read Aloud service. Supports 400+ voices in 100+ languages, free, no API key required.

Installation

Add to your project's dependencies in _index.yaml:

- name: dep.edge-tts
  kind: ns.dependency
  component: butschster/edge-tts

Quick Start

1. Register voices in `_index.yaml`

Voice entries must use kind: registry.entry with meta.type: edge_tts.voice:

entries:
  - name: voice.ru.male
    kind: registry.entry
    meta:
      type: edge_tts.voice
      voice_id: "ru-RU-DmitryNeural"
      language: "ru-RU"
      gender: "male"
      output_format: "audio-24khz-48kbitrate-mono-mp3"
      description: "Dmitry - Russian male voice"

  - name: voice.en.female
    kind: registry.entry
    meta:
      type: edge_tts.voice
      voice_id: "en-US-JennyNeural"
      language: "en-US"
      gender: "female"
      output_format: "audio-24khz-48kbitrate-mono-mp3"
      description: "Jenny - English female voice"

2. Import in your process/function

The edge_tts library requires several modules available in the runtime. List them in your process/function entry:

- name: my_process
  kind: process.lua
  source: file://my_process.lua
  method: main
  modules:
    - logger
    - time
    - json
    - websocket
    - uuid
    - http_client
    - hash
  imports:
    edge_tts: edge_tts:tts

3. Use in Lua code

local edge_tts = require("edge_tts")

local function main()
    -- Synthesize via registry voice reference
    local audio, err = edge_tts.synthesize("app:voice.ru.male", "Привет, мир!")
    if err then
        print("Error: " .. tostring(err))
        return
    end

    -- audio.data          — raw audio bytes
    -- audio.size          — byte count
    -- audio.format        — MIME type (e.g. "audio/mpeg")
    -- audio.output_format — Edge TTS format string
    -- audio.voice_id      — voice ID used
end

API

`edge_tts.synthesize(voice_ref, text, options?)`

Synthesize speech using a registry-configured voice.

Parameter	Type	Description
`voice_ref`	string	Registry reference (e.g. `app:voice.ru.male`)
`text`	string	Text to synthesize (1-5000 chars)
`options`	table?	Override rate, volume, pitch, output_format, timeout

Returns: AudioResult?, error?

-- Basic usage
local audio, err = edge_tts.synthesize("app:voice.ru.male", "Привет!")

-- With options override
local audio, err = edge_tts.synthesize("app:voice.en.female", "Hello!", {
    rate = "+20%",
    volume = "+10%",
})

`edge_tts.synthesize_direct(options)`

Direct synthesis without registry lookup. Useful for one-off calls or testing.

local audio, err = edge_tts.synthesize_direct({
    voice_id = "ru-RU-DmitryNeural",
    text = "Привет!",
    output_format = "audio-24khz-96kbitrate-mono-mp3",
    rate = "+20%",      -- speech rate (default: "+0%")
    volume = "+0%",     -- volume (default: "+0%")
    pitch = "+0Hz",     -- pitch (default: "+0Hz")
    timeout = "30s"     -- connection timeout (default: "30s")
})

`edge_tts.list_voices(filter?)`

Get available voices from Microsoft Edge TTS.

-- All voices
local voices, err = edge_tts.list_voices()

-- Filter by locale
local ru_voices, err = edge_tts.list_voices({ locale = "ru-RU" })

-- Filter by locale and gender
local voices, err = edge_tts.list_voices({ locale = "en-US", gender = "Female" })

Returns a list of VoiceInfo tables:

{
    short_name = "ru-RU-DmitryNeural",
    name = "Microsoft Server Speech Text to Speech Voice (ru-RU, DmitryNeural)",
    friendly_name = "Dmitry",
    locale = "ru-RU",
    gender = "Male",
    status = "GA",
    codec = "audio-24khz-48kbitrate-mono-mp3",
    categories = { ... },
    personalities = { ... }
}

AudioResult

Field	Type	Description
`data`	string	Raw audio bytes
`size`	number	Byte count
`format`	string	MIME type (auto-detected from format)
`output_format`	string	Edge TTS format string used
`voice_id`	string	Voice ID used

Voice Configuration

Voice meta fields

Field	Required	Default	Description
`voice_id`	yes	—	Microsoft voice ID
`language`	no	—	Language tag (e.g. ru-RU)
`gender`	no	—	male / female
`output_format`	no	`audio-24khz-48kbitrate-mono-mp3`	Audio format string
`rate`	no	`+0%`	Speech rate
`volume`	no	`+0%`	Volume
`pitch`	no	`+0Hz`	Pitch
`description`	no	—	Human-readable label

Voice with prosody presets

- name: voice.ru.fast
  kind: registry.entry
  meta:
    type: edge_tts.voice
    voice_id: "ru-RU-DmitryNeural"
    language: "ru-RU"
    rate: "+30%"
    volume: "+10%"
    pitch: "+5Hz"
    output_format: "audio-24khz-48kbitrate-mono-mp3"

Full example with Telegram bot

# _index.yaml
entries:
  # 1. Add dependency
  - name: dep.edge-tts
    kind: ns.dependency
    component: butschster/edge-tts

  # 2. Register voices
  - name: voice.ru.male
    kind: registry.entry
    meta:
      type: edge_tts.voice
      voice_id: "ru-RU-DmitryNeural"
      language: "ru-RU"
      gender: "male"
      output_format: "audio-24khz-48kbitrate-mono-mp3"

  - name: voice.en.female
    kind: registry.entry
    meta:
      type: edge_tts.voice
      voice_id: "en-US-JennyNeural"
      language: "en-US"
      gender: "female"
      output_format: "audio-24khz-48kbitrate-mono-mp3"

  # 3. Use in a process (include all required modules)
  - name: my_session
    kind: process.lua
    source: file://session.lua
    method: main
    modules:
      - funcs
      - logger
      - time
      - json
      - websocket
      - uuid
      - http_client
      - hash
    imports:
      edge_tts: edge_tts:tts

-- session.lua
local edge_tts = require("edge_tts")
local funcs = require("funcs")

local VOICE_MAP = {
    ru = "app:voice.ru.male",
    en = "app:voice.en.female",
}

local function send_voice(chat_id, text, lang)
    local voice_ref = VOICE_MAP[lang] or VOICE_MAP["en"]

    local audio, err = edge_tts.synthesize(voice_ref, text)
    if err then
        return nil, err
    end

    -- Send as Telegram voice/audio message
    return funcs.call("telegram.sdk:send_voice", {
        chat_id = chat_id,
        voice_bytes = audio.data,
        filename = "voice.mp3",
        content_type = audio.format,  -- "audio/mpeg" for MP3
    })
end

Supported Audio Formats

The free Edge TTS WebSocket endpoint does not support all formats from the official Azure Speech REST API. Only these have been verified to work:

Format string	MIME type	Status
`audio-24khz-48kbitrate-mono-mp3`	audio/mpeg	Works
`audio-24khz-96kbitrate-mono-mp3`	audio/mpeg	Works
`audio-48khz-192kbitrate-mono-mp3`	audio/mpeg	Works
`webm-24khz-16bit-mono-opus`	audio/webm	Works
`ogg-24khz-16bit-mono-opus`	audio/ogg	Not working
`ogg-16khz-16bit-mono-opus`	audio/ogg	Not working

Recommendation: Use audio-24khz-48kbitrate-mono-mp3 as the default. OGG Opus formats are accepted by the server (no connection error) but produce no audio data — the server sends turn.start and response messages but never sends binary audio frames.

Telegram voice messages

Telegram's native voice player (waveform + transcription) requires OGG Opus format. Since the free Edge TTS endpoint doesn't support OGG output, voice messages sent as MP3 will appear as regular audio files without the voice player UI. To get the native voice experience, you need server-side conversion (e.g. ffmpeg).

Popular Voices

Voice ID	Language	Gender
`ru-RU-DmitryNeural`	Russian	Male
`ru-RU-SvetlanaNeural`	Russian	Female
`en-US-GuyNeural`	English (US)	Male
`en-US-JennyNeural`	English (US)	Female
`en-US-AriaNeural`	English (US)	Female
`en-GB-SoniaNeural`	English (UK)	Female
`de-DE-ConradNeural`	German	Male
`uk-UA-OstapNeural`	Ukrainian	Male

Use edge_tts.list_voices() to discover all 400+ available voices.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
wippy.lock		wippy.lock
wippy.yaml		wippy.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Edge TTS Module for Wippy

Installation

Quick Start

1. Register voices in `_index.yaml`

2. Import in your process/function

3. Use in Lua code

API

`edge_tts.synthesize(voice_ref, text, options?)`

`edge_tts.synthesize_direct(options)`

`edge_tts.list_voices(filter?)`

AudioResult

Voice Configuration

Voice meta fields

Voice with prosody presets

Full example with Telegram bot

Supported Audio Formats

Telegram voice messages

Popular Voices

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Edge TTS Module for Wippy

Installation

Quick Start

1. Register voices in _index.yaml

2. Import in your process/function

3. Use in Lua code

API

edge_tts.synthesize(voice_ref, text, options?)

edge_tts.synthesize_direct(options)

edge_tts.list_voices(filter?)

AudioResult

Voice Configuration

Voice meta fields

Voice with prosody presets

Full example with Telegram bot

Supported Audio Formats

Telegram voice messages

Popular Voices

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages

1. Register voices in `_index.yaml`

`edge_tts.synthesize(voice_ref, text, options?)`

`edge_tts.synthesize_direct(options)`

`edge_tts.list_voices(filter?)`