Doppelvoice

Your voice, in any language. Real-time speech-to-speech translation with zero-shot voice cloning across 9 languages (Chinese / English / Japanese / Indonesian / Spanish / Portuguese / German / French + bilingual ZH⇄EN auto). The other party hears the target language in your own voice through any meeting app — Zoom, Teams, WeChat, Google Meet, OBS, anything that takes a microphone.

Powered by ByteDance Doubao Seed LiveInterpret 2.0.

中文 · Architecture · Setup · Troubleshooting

What it does

You speak <source lang>  ─►  Doppelvoice  ─►  Peer hears <target lang> (in your voice)
   ┌──────────────────┐       ┌─────────┐       ┌──────────────────────────────┐
   │     your mic     │ ────► │ Doubao  │ ────► │ virtual mic → Zoom / Teams … │
   └──────────────────┘       │ AST 2.0 │       └──────────────────────────────┘
                              └─────────┘

Pick any of 9 source/target language codes (zh / en / ja / id / es / pt / de / fr) or use zhen on both sides for bilingual ZH⇄EN auto-detection.

End-to-end latency ≈ 2.5–3 s. Subtitles stream token-by-token; voice is cloned zero-shot from your speech as you talk.

Features

🎙 End-to-end speech-to-speech — no separate STT / MT / TTS plumbing
🗣 Zero-shot voice cloning — model captures your voice on the fly; explicit denoise=false to retain breath / resonance details
🌐 9 languages — zh / en / ja / id / es / pt / de / fr / zhen (the last one is the bilingual ZH⇄EN auto mode)
⚡ ~2.5 s latency — production-grade real-time
🪟 Native Windows GUI (PySide6) with live bilingual subtitles
🔌 Universal compatibility — anything that accepts a microphone works
🔁 Automatic reconnect with exponential backoff and fatal-error classification
🔒 Privacy-first defaults — translated audio and subtitles never persist to disk unless you opt in; logs auto-redact API keys and bearer tokens
🧹 Clean device picker — one entry per physical device (host-API duplicates collapsed; MME 31-char name truncation handled)
🛠 Configurable — sample rate, jitter buffer, RMS gate, denoise toggle, speaker_id, all tweakable

Demo

Quick start

Two ways to install. Option A is the fastest (no Python needed).

Option A — Pre-built Windows binary (recommended)

Install VB-Audio Virtual Cable → run installer as admin → reboot.
Download the latest Doppelvoice-vX.Y.Z-win64.zip from the Releases page.
Unzip anywhere, then inside the folder: copy .env.example → .env, fill in DOUBAO_APP_KEY / DOUBAO_ACCESS_KEY (get them from the Volcengine Console).
Double-click Doppelvoice.exe. The GUI opens.
In your meeting app, set the microphone to CABLE Output (VB-Audio Virtual Cable).

Option B — From source (for developers)

git clone https://github.com/TianqBu/Doppelvoice.git
cd Doppelvoice
python -m venv .venv
.venv\Scripts\pip install -e .       :: installs from pyproject.toml
:: or: .venv\Scripts\pip install -r requirements.txt

copy .env.example .env
notepad .env       :: fill in DOUBAO_APP_KEY / DOUBAO_ACCESS_KEY

check.bat          :: verifies devices + API connectivity + StartSession
gui.bat            :: launches the GUI
run.bat            :: CLI mode

In your meeting app: pick CABLE Output (VB-Audio Virtual Cable) as the microphone.

CLI

run.bat                              :: start translation (CLI)
run.bat --gui                        :: launch GUI
run.bat --check                      :: self-check
run.bat --list-devices               :: list audio devices
run.bat --source en --target zh      :: reverse direction
run.bat --jitter-ms 80               :: lower latency (more underrun risk)
run.bat --log-level DEBUG            :: verbose logs

Configuration

All settings have sensible defaults. Override via .env or CLI flags.

Variable	Default	Notes
`DOUBAO_APP_KEY` / `DOUBAO_ACCESS_KEY`	required	from Volcengine console
`DOUBAO_RESOURCE_ID`	`volc.service_type.10053`	AST 2.0 resource ID
`SOURCE_LANG` / `TARGET_LANG`	`zh` / `en`	one of `zh / en / ja / id / es / pt / de / fr / zhen`. Use `zhen` on both sides for bilingual ZH⇄EN auto mode.
`MODE`	`s2s`	`s2s` (speech→speech) or `s2t` (speech→text)
`DENOISE`	`0`	`1` = server-side denoise on (cleaner input but flatter voice clone). `0` keeps breath / resonance for better cloning.
`SPEAKER_ID`	empty	Doubao `ReqParams.speaker_id` — empty = clone the speaker; set to a preset like `zh_female_vv_uranus_bigtts` to use a stock voice instead
`INPUT_DEVICE` / `OUTPUT_DEVICE`	auto	substring of device name (host API hidden; one entry per physical device)
`LOG_LEVEL`	`INFO`	`DEBUG` for verbose
`DUMP_AUDIO`	`false`	persist per-sentence ogg blobs (debug only)
`LOG_SUBTITLE`	`false`	persist subtitle text in logs (debug only)

Architecture

src/doppelvoice/
├── engine/        # Doubao AST 2.0 protobuf WebSocket client
├── audio/         # PortAudio (sounddevice) capture + playback + ogg/opus decoder
├── pipeline/      # asyncio orchestration: capture → ws → decode → playback
├── gui/           # PySide6 + qasync
├── cli.py
└── config.py

See docs/en/ARCHITECTURE.md for the full protocol details.

Tested with

Windows 10 / 11 x64
Python 3.10–3.12
VB-Audio Virtual Cable 1.0.4 (Driver Pack 43)
Zoom, 腾讯会议, 微信电话, Google Meet (Chrome), OBS

Known limitations

Voice cloning quality varies with mic and clarity. AirPods over Bluetooth HFP (16 kHz narrowband phone mode) gives mediocre results — a wired/USB mic or laptop built-in mic is recommended. The default denoise=false already tells the server to keep your voice's unique characteristics; toggling it on in Settings would flatten the clone further.
End-to-end latency floor ≈ 2.5 s is the model's hard limit per the Seed LiveInterpret 2.0 paper; local processing adds <500 ms.
Voice expressiveness of the public AST API is good but not as lively as the Volcengine Console demo (which goes through a different BFF endpoint).
Per-sentence audio decoding (ogg_opus) adds ~500 ms latency vs raw PCM (which the API does not currently honor).
Use headphones, not speakers. With external speakers the meeting audio gets re-captured by your mic, re-translated, and sent back to the peer as their own translated voice — a textbook acoustic feedback loop. See Troubleshooting.

Privacy

API keys live only in .env (gitignored).
Translated audio and subtitle text are not persisted to disk by default.
Set DUMP_AUDIO=1 / LOG_SUBTITLE=1 for debugging only.
All audio is sent through ByteDance's Doubao API. Review their Terms of Service before use with sensitive content.

Contributing

PRs welcome. See CONTRIBUTING.md.

License

MIT.

Acknowledgements

ByteDance Seed LiveInterpret 2.0 — the underlying translation model
kizuna-ai-lab/sokuji — protobuf reverse-engineering reference
VB-Audio Virtual Cable — virtual audio routing on Windows

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
docs		docs
examples		examples
src/doppelvoice		src/doppelvoice
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
check.bat		check.bat
doppelvoice.spec		doppelvoice.spec
gui.bat		gui.bat
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.bat		run.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Doppelvoice

What it does

Features

Demo

Quick start

Option A — Pre-built Windows binary (recommended)

Option B — From source (for developers)

CLI

Configuration

Architecture

Tested with

Known limitations

Privacy

Contributing

License

Acknowledgements

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Doppelvoice

What it does

Features

Demo

Quick start

Option A — Pre-built Windows binary (recommended)

Option B — From source (for developers)

CLI

Configuration

Architecture

Tested with

Known limitations

Privacy

Contributing

License

Acknowledgements

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages