Your voice, in any language. Real-time speech-to-speech translation with zero-shot voice cloning across 9 languages (Chinese / English / Japanese / Indonesian / Spanish / Portuguese / German / French + bilingual ZH⇄EN auto). The other party hears the target language in your own voice through any meeting app — Zoom, Teams, WeChat, Google Meet, OBS, anything that takes a microphone.
Powered by ByteDance Doubao Seed LiveInterpret 2.0.
中文 · Architecture · Setup · Troubleshooting
You speak <source lang> ─► Doppelvoice ─► Peer hears <target lang> (in your voice)
┌──────────────────┐ ┌─────────┐ ┌──────────────────────────────┐
│ your mic │ ────► │ Doubao │ ────► │ virtual mic → Zoom / Teams … │
└──────────────────┘ │ AST 2.0 │ └──────────────────────────────┘
└─────────┘
Pick any of 9 source/target language codes (zh / en / ja / id / es / pt / de / fr)
or use zhen on both sides for bilingual ZH⇄EN auto-detection.
End-to-end latency ≈ 2.5–3 s. Subtitles stream token-by-token; voice is cloned zero-shot from your speech as you talk.
- 🎙 End-to-end speech-to-speech — no separate STT / MT / TTS plumbing
- 🗣 Zero-shot voice cloning — model captures your voice on the fly; explicit
denoise=falseto retain breath / resonance details - 🌐 9 languages —
zh / en / ja / id / es / pt / de / fr / zhen(the last one is the bilingual ZH⇄EN auto mode) - ⚡ ~2.5 s latency — production-grade real-time
- 🪟 Native Windows GUI (PySide6) with live bilingual subtitles
- 🔌 Universal compatibility — anything that accepts a microphone works
- 🔁 Automatic reconnect with exponential backoff and fatal-error classification
- 🔒 Privacy-first defaults — translated audio and subtitles never persist to disk unless you opt in; logs auto-redact API keys and bearer tokens
- 🧹 Clean device picker — one entry per physical device (host-API duplicates collapsed; MME 31-char name truncation handled)
- 🛠 Configurable — sample rate, jitter buffer, RMS gate, denoise toggle, speaker_id, all tweakable
Two ways to install. Option A is the fastest (no Python needed).
- Install VB-Audio Virtual Cable → run installer as admin → reboot.
- Download the latest
Doppelvoice-vX.Y.Z-win64.zipfrom the Releases page. - Unzip anywhere, then inside the folder: copy
.env.example→.env, fill inDOUBAO_APP_KEY/DOUBAO_ACCESS_KEY(get them from the Volcengine Console). - Double-click
Doppelvoice.exe. The GUI opens. - In your meeting app, set the microphone to
CABLE Output (VB-Audio Virtual Cable).
git clone https://github.com/TianqBu/Doppelvoice.git
cd Doppelvoice
python -m venv .venv
.venv\Scripts\pip install -e . :: installs from pyproject.toml
:: or: .venv\Scripts\pip install -r requirements.txt
copy .env.example .env
notepad .env :: fill in DOUBAO_APP_KEY / DOUBAO_ACCESS_KEY
check.bat :: verifies devices + API connectivity + StartSession
gui.bat :: launches the GUI
run.bat :: CLI modeIn your meeting app: pick CABLE Output (VB-Audio Virtual Cable) as the microphone.
run.bat :: start translation (CLI)
run.bat --gui :: launch GUI
run.bat --check :: self-check
run.bat --list-devices :: list audio devices
run.bat --source en --target zh :: reverse direction
run.bat --jitter-ms 80 :: lower latency (more underrun risk)
run.bat --log-level DEBUG :: verbose logsAll settings have sensible defaults. Override via .env or CLI flags.
| Variable | Default | Notes |
|---|---|---|
DOUBAO_APP_KEY / DOUBAO_ACCESS_KEY |
required | from Volcengine console |
DOUBAO_RESOURCE_ID |
volc.service_type.10053 |
AST 2.0 resource ID |
SOURCE_LANG / TARGET_LANG |
zh / en |
one of zh / en / ja / id / es / pt / de / fr / zhen. Use zhen on both sides for bilingual ZH⇄EN auto mode. |
MODE |
s2s |
s2s (speech→speech) or s2t (speech→text) |
DENOISE |
0 |
1 = server-side denoise on (cleaner input but flatter voice clone). 0 keeps breath / resonance for better cloning. |
SPEAKER_ID |
empty | Doubao ReqParams.speaker_id — empty = clone the speaker; set to a preset like zh_female_vv_uranus_bigtts to use a stock voice instead |
INPUT_DEVICE / OUTPUT_DEVICE |
auto | substring of device name (host API hidden; one entry per physical device) |
LOG_LEVEL |
INFO |
DEBUG for verbose |
DUMP_AUDIO |
false |
persist per-sentence ogg blobs (debug only) |
LOG_SUBTITLE |
false |
persist subtitle text in logs (debug only) |
src/doppelvoice/
├── engine/ # Doubao AST 2.0 protobuf WebSocket client
├── audio/ # PortAudio (sounddevice) capture + playback + ogg/opus decoder
├── pipeline/ # asyncio orchestration: capture → ws → decode → playback
├── gui/ # PySide6 + qasync
├── cli.py
└── config.py
See docs/en/ARCHITECTURE.md for the full protocol details.
- Windows 10 / 11 x64
- Python 3.10–3.12
- VB-Audio Virtual Cable 1.0.4 (Driver Pack 43)
- Zoom, 腾讯会议, 微信电话, Google Meet (Chrome), OBS
- Voice cloning quality varies with mic and clarity. AirPods over Bluetooth
HFP (16 kHz narrowband phone mode) gives mediocre results — a wired/USB mic
or laptop built-in mic is recommended. The default
denoise=falsealready tells the server to keep your voice's unique characteristics; toggling it on in Settings would flatten the clone further. - End-to-end latency floor ≈ 2.5 s is the model's hard limit per the Seed LiveInterpret 2.0 paper; local processing adds <500 ms.
- Voice expressiveness of the public AST API is good but not as lively as the Volcengine Console demo (which goes through a different BFF endpoint).
- Per-sentence audio decoding (ogg_opus) adds ~500 ms latency vs raw PCM (which the API does not currently honor).
- Use headphones, not speakers. With external speakers the meeting audio gets re-captured by your mic, re-translated, and sent back to the peer as their own translated voice — a textbook acoustic feedback loop. See Troubleshooting.
- API keys live only in
.env(gitignored). - Translated audio and subtitle text are not persisted to disk by default.
- Set
DUMP_AUDIO=1/LOG_SUBTITLE=1for debugging only. - All audio is sent through ByteDance's Doubao API. Review their Terms of Service before use with sensitive content.
PRs welcome. See CONTRIBUTING.md.
MIT.
- ByteDance Seed LiveInterpret 2.0 — the underlying translation model
- kizuna-ai-lab/sokuji — protobuf reverse-engineering reference
- VB-Audio Virtual Cable — virtual audio routing on Windows
