Give your AI agents the ability to listen
Microphone capture and speech-to-text tools for MCP-compatible agents.
| Tool | Description |
|---|---|
list_audio_devices |
List available microphone input devices |
capture_audio |
Record audio from the microphone and save as WAV |
voice_query |
Capture, transcribe (whisper.cpp), and query a local LLM (Ollama) |
claude mcp add mcp-listen npx mcp-listenAdd to your MCP configuration:
{
"mcpServers": {
"mcp-listen": {
"command": "npx",
"args": ["-y", "mcp-listen"]
}
}
}Compatible with Claude Desktop, ChatGPT Desktop, Cursor, GitHub Copilot, Windsurf, VS Code, Gemini, Zed, and any MCP-compatible client.
npm install -g mcp-listenFor list_audio_devices and capture_audio:
- Node.js 18+
- A microphone
For voice_query (optional):
- Ollama running locally
- Whisper GGML model (see Whisper Model Setup)
Returns a JSON array of available audio input devices.
Parameters: None
Example response:
[
{ "index": 3, "name": "Microphone (Creative Live! Cam)", "isDefault": true, "maxInputChannels": 2, "defaultSampleRate": 48000 },
{ "index": 4, "name": "Microphone Array (Intel)", "isDefault": false, "maxInputChannels": 2, "defaultSampleRate": 48000 }
]Records audio from the microphone and saves as a WAV file.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
duration_ms |
number | 5000 | Recording duration in milliseconds (100-30000) |
device |
number | system default | Device index from list_audio_devices |
Example response:
{
"path": "/tmp/mcp-listen-1712345678901.wav",
"duration_ms": 5000,
"sample_rate": 16000,
"channels": 1,
"size_bytes": 160044
}Full voice pipeline: capture audio, transcribe with whisper.cpp, send to Ollama, return the response. Entirely offline.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
duration_ms |
number | 5000 | Recording duration in milliseconds (100-30000) |
device |
number | system default | Device index from list_audio_devices |
whisper_model |
string | ggml-base.en.bin | Path or filename of Whisper GGML model |
language |
string | en | Language code for transcription |
model |
string | llama3.2 | Ollama model name |
prompt |
string | You are a helpful assistant. | System prompt for the LLM |
Example response:
{
"transcription": "What is the default port for PostgreSQL?",
"response": "PostgreSQL runs on port 5432 by default.",
"model": "llama3.2"
}mcp-listen uses decibri for cross-platform microphone capture. No ffmpeg, no SoX, no system audio tools required. Pre-built native binaries with zero setup.
Audio is captured as 16-bit PCM at 16kHz mono, the standard format for speech-to-text engines.
The voice_query tool replicates the pipeline from voxagent: capture audio, transcribe locally with whisper.cpp, and send to a local Ollama LLM. Fully offline, nothing leaves your machine.
The voice_query tool requires a Whisper GGML model file. Download one:
Linux / macOS:
mkdir -p ~/.mcp-listen/models
curl -L -o ~/.mcp-listen/models/ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.binWindows (PowerShell):
mkdir "$env:USERPROFILE\.mcp-listen\models" -Force
Invoke-WebRequest -Uri "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin" -OutFile "$env:USERPROFILE\.mcp-listen\models\ggml-base.en.bin"The model is ~150MB and downloads once. You can also set the WHISPER_MODEL_PATH environment variable to a custom directory.
- Install Ollama from https://ollama.com
- Pull a model:
ollama pull llama3.2 - Ensure Ollama is running:
ollama serve
- Fixed recording duration. You specify how long to record. There is no "stop when I stop talking" mode yet.
voice_queryrequires Ollama running. If Ollama isn't running, the tool returns a clear error message.- Whisper model downloads on first use. The first
voice_querycall requires a pre-downloaded model (~150MB). - No streaming. MCP's request/response pattern means the entire recording is captured, then transcribed, then sent to the LLM. No real-time partial results.
- Temp files.
capture_audiowrites WAV files to the system temp directory. They are not automatically cleaned up.voice_querycleans up after itself.
Windows: "Error opening microphone" Windows may block microphone access by default. Go to Settings > Privacy & security > Microphone and ensure microphone access is enabled for desktop apps.
Ollama: "Ollama is not running"
Some Ollama installations start as a background service automatically. If you see this error, run ollama serve manually or check that the Ollama service is running.
Whisper: "model not found" The whisper model file must be downloaded before first use. See Whisper Model Setup for instructions.
- decibri: Cross-platform microphone capture for Node.js
- voxagent: Voice-powered terminal agent (inspiration for the voice_query pipeline)
Apache-2.0. See LICENSE for details.
Copyright 2026 Analytics in Motion