VoiceForge is a browser-based assistive video tool that lets a user type during calls and output cloned speech with a lip-synced face preview.
- Why This Exists
- Tech Stack
- Browser Compatibility
- Setup
- Environment Variables
- Using VoiceForge In A Call
- OBS Virtual Camera Setup
- API
- Roadmap
- License
- About
Deaf and speech-impaired people on video calls are often pushed into chat boxes, delayed interpretation, or awkward turn-taking. VoiceForge explores a local-first interface where typed intent can become spoken audio and a synchronized visual feed, helping the user participate in the same conversational channel as everyone else.
VoiceForge targets Chrome and Edge only. WebRTC Insertable Streams and canvas capture APIs are still uneven across browsers, so Firefox and Safari are not supported for the virtual camera MVP.
- Install Node.js 18 or newer.
- Create an ElevenLabs account at elevenlabs.io and copy your API key.
- From the repository root, install dependencies:
npm install- Copy the example environment file:
cp .env.example .env- Add your ElevenLabs API key to
.env, or skip it and setMOCK_ELEVENLABS=trueto run in offline dev mode (see Contributing for details). - Start the client and server together:
npm run dev- Open
http://localhost:5173in Chrome or Edge.
| Variable | Required | Description |
|---|---|---|
ELEVENLABS_API_KEY |
Yes (or use MOCK_ELEVENLABS) |
Server-side API key used for voice cloning and TTS requests. |
PORT |
No | Express API port. Defaults to 3001. |
CLIENT_URL |
No | Allowed CORS origin for the Vite app. Defaults to http://localhost:5173. |
MOCK_ELEVENLABS |
No | Set to true to skip real ElevenLabs calls in dev/CI. Returns fixture data. Ignored in production. |
- Open VoiceForge in Chrome or Edge.
- Record a 10-second consent-based reference clip.
- Clone the voice and continue to the Call page.
- Allow webcam access.
- Type a phrase and press Enter or Speak.
- Turn on Go Live to expose the canvas stream inside the browser.
- In Zoom, Google Meet, or Microsoft Teams, open camera settings and select the virtual camera source you have configured.
Most video call apps cannot directly select a browser tab as a system camera. For the MVP, install OBS Studio and use OBS Virtual Camera as the bridge.
- Install OBS Studio.
- Add a Browser Source pointing to
http://localhost:5173. - Crop the source to the lip-synced output preview.
- Click Start Virtual Camera in OBS.
- Select OBS Virtual Camera in Zoom, Meet, or Teams.
Screenshot placeholder: OBS browser source configuration.
Screenshot placeholder: Zoom camera picker showing OBS Virtual Camera.
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/voice/clone |
Upload reference audio, call ElevenLabs voice cloning, and return voice_id. |
POST |
/api/voice/speak |
Send text, voice_id, and optional voice settings, then return a speechId and streaming audioUrl. |
GET |
/api/voice/speak/stream/:speechId |
Stream the generated ElevenLabs speech audio for a pending speech request. |
GET |
/api/health |
Return local API health. |
- Done: Store cloned voice profiles and reference audio Blobs in IndexedDB via
client/src/utils/db.js. - Done: Stream TTS audio through
POST /api/voice/speakandGET /api/voice/speak/stream/:speechId. - In progress: Voice tuning controls are wired through persisted
voice_settings; multilingual output uses the ElevenLabseleven_multilingual_v2model, but dedicated language controls still need UI. - In progress: The MVP virtual camera uses canvas capture; full WebRTC Insertable Streams frame replacement remains future work.
- TODO: Replace the placeholder
models/wav2lip.onnxwith a real lightweight browser Wav2Lip ONNX model. - TODO: Implement real ONNX Runtime Web Wav2Lip inference.
- TODO: Replace the fallback mouth animation with model-driven mouth movement.
- TODO: Add richer virtual camera documentation for OBS and each call provider.
- TODO: Add dedicated multilingual voice controls.
- TODO: Add automated browser tests for camera and microphone permission flows.
MIT