VoiceForge

VoiceForge is a browser-based assistive video tool that lets a user type during calls and output cloned speech with a lip-synced face preview.

📑 Table of Contents

Why This Exists
Tech Stack
Browser Compatibility
Setup
Environment Variables
Using VoiceForge In A Call
OBS Virtual Camera Setup
API
Roadmap
License
About

Why This Exists

Deaf and speech-impaired people on video calls are often pushed into chat boxes, delayed interpretation, or awkward turn-taking. VoiceForge explores a local-first interface where typed intent can become spoken audio and a synchronized visual feed, helping the user participate in the same conversational channel as everyone else.

Tech Stack

Browser Compatibility

VoiceForge targets Chrome and Edge only. WebRTC Insertable Streams and canvas capture APIs are still uneven across browsers, so Firefox and Safari are not supported for the virtual camera MVP.

Setup

Install Node.js 18 or newer.
Create an ElevenLabs account at elevenlabs.io and copy your API key.
From the repository root, install dependencies:

npm install

Copy the example environment file:

cp .env.example .env

Add your ElevenLabs API key to .env, or skip it and set MOCK_ELEVENLABS=true to run in offline dev mode (see Contributing for details).
Start the client and server together:

npm run dev

Open http://localhost:5173 in Chrome or Edge.

Environment Variables

Variable	Required	Description
`ELEVENLABS_API_KEY`	Yes (or use `MOCK_ELEVENLABS`)	Server-side API key used for voice cloning and TTS requests.
`PORT`	No	Express API port. Defaults to `3001`.
`CLIENT_URL`	No	Allowed CORS origin for the Vite app. Defaults to `http://localhost:5173`.
`MOCK_ELEVENLABS`	No	Set to `true` to skip real ElevenLabs calls in dev/CI. Returns fixture data. Ignored in production.

Using VoiceForge In A Call

Open VoiceForge in Chrome or Edge.
Record a 10-second consent-based reference clip.
Clone the voice and continue to the Call page.
Allow webcam access.
Type a phrase and press Enter or Speak.
Turn on Go Live to expose the canvas stream inside the browser.
In Zoom, Google Meet, or Microsoft Teams, open camera settings and select the virtual camera source you have configured.

OBS Virtual Camera Setup

Most video call apps cannot directly select a browser tab as a system camera. For the MVP, install OBS Studio and use OBS Virtual Camera as the bridge.

Install OBS Studio.
Add a Browser Source pointing to http://localhost:5173.
Crop the source to the lip-synced output preview.
Click Start Virtual Camera in OBS.
Select OBS Virtual Camera in Zoom, Meet, or Teams.

Screenshot placeholder: OBS browser source configuration.

Screenshot placeholder: Zoom camera picker showing OBS Virtual Camera.

API

Method	Endpoint	Description
`POST`	`/api/voice/clone`	Upload reference audio, call ElevenLabs voice cloning, and return `voice_id`.
`POST`	`/api/voice/speak`	Send text, `voice_id`, and optional voice settings, then return a `speechId` and streaming `audioUrl`.
`GET`	`/api/voice/speak/stream/:speechId`	Stream the generated ElevenLabs speech audio for a pending speech request.
`GET`	`/api/health`	Return local API health.

Roadmap

Done: Store cloned voice profiles and reference audio Blobs in IndexedDB via client/src/utils/db.js.
Done: Stream TTS audio through POST /api/voice/speak and GET /api/voice/speak/stream/:speechId.
In progress: Voice tuning controls are wired through persisted voice_settings; multilingual output uses the ElevenLabs eleven_multilingual_v2 model, but dedicated language controls still need UI.
In progress: The MVP virtual camera uses canvas capture; full WebRTC Insertable Streams frame replacement remains future work.
TODO: Replace the placeholder models/wav2lip.onnx with a real lightweight browser Wav2Lip ONNX model.
TODO: Implement real ONNX Runtime Web Wav2Lip inference.
TODO: Replace the fallback mouth animation with model-driven mouth movement.
TODO: Add richer virtual camera documentation for OBS and each call provider.
TODO: Add dedicated multilingual voice controls.
TODO: Add automated browser tests for camera and microphone permission flows.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
.github		.github
client		client
models		models
server		server
.coderabbit.yaml		.coderabbit.yaml
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceForge

📑 Table of Contents

Why This Exists

Tech Stack

Browser Compatibility

Setup

Environment Variables

Using VoiceForge In A Call

OBS Virtual Camera Setup

API

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoiceForge

📑 Table of Contents

Why This Exists

Tech Stack

Browser Compatibility

Setup

Environment Variables

Using VoiceForge In A Call

OBS Virtual Camera Setup

API

Roadmap

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages