mcp-listen

Give your AI agents the ability to listen

Microphone capture and speech-to-text tools for MCP-compatible agents.

Meta
Powered by

Tools

Tool	Description
`list_audio_devices`	List available microphone input devices
`capture_audio`	Record audio from the microphone and save as WAV
`voice_query`	Capture, transcribe (whisper.cpp), and query a local LLM (Ollama)

Quick Start

Claude Code

claude mcp add mcp-listen npx mcp-listen

Claude Desktop / ChatGPT Desktop / Cursor / Windsurf / VS Code

Add to your MCP configuration:

{
  "mcpServers": {
    "mcp-listen": {
      "command": "npx",
      "args": ["-y", "mcp-listen"]
    }
  }
}

Compatible with Claude Desktop, ChatGPT Desktop, Cursor, GitHub Copilot, Windsurf, VS Code, Gemini, Zed, and any MCP-compatible client.

Global Install

npm install -g mcp-listen

Requirements

For list_audio_devices and capture_audio:

Node.js 18+
A microphone

For voice_query (optional):

Ollama running locally
Whisper GGML model (see Whisper Model Setup)

Tool Reference

list_audio_devices

Returns a JSON array of available audio input devices.

Parameters: None

Example response:

[
  { "index": 3, "name": "Microphone (Creative Live! Cam)", "isDefault": true, "maxInputChannels": 2, "defaultSampleRate": 48000 },
  { "index": 4, "name": "Microphone Array (Intel)", "isDefault": false, "maxInputChannels": 2, "defaultSampleRate": 48000 }
]

capture_audio

Records audio from the microphone and saves as a WAV file.

Parameters:

Parameter	Type	Default	Description
`duration_ms`	number	5000	Recording duration in milliseconds (100-30000)
`device`	number	system default	Device index from `list_audio_devices`

Example response:

{
  "path": "/tmp/mcp-listen-1712345678901.wav",
  "duration_ms": 5000,
  "sample_rate": 16000,
  "channels": 1,
  "size_bytes": 160044
}

voice_query

Full voice pipeline: capture audio, transcribe with whisper.cpp, send to Ollama, return the response. Entirely offline.

Parameters:

Parameter	Type	Default	Description
`duration_ms`	number	5000	Recording duration in milliseconds (100-30000)
`device`	number	system default	Device index from `list_audio_devices`
`whisper_model`	string	ggml-base.en.bin	Path or filename of Whisper GGML model
`language`	string	en	Language code for transcription
`model`	string	llama3.2	Ollama model name
`prompt`	string	You are a helpful assistant.	System prompt for the LLM

Example response:

{
  "transcription": "What is the default port for PostgreSQL?",
  "response": "PostgreSQL runs on port 5432 by default.",
  "model": "llama3.2"
}

How It Works

mcp-listen uses decibri for cross-platform microphone capture. No ffmpeg, no SoX, no system audio tools required. Pre-built native binaries with zero setup.

Audio is captured as 16-bit PCM at 16kHz mono, the standard format for speech-to-text engines.

The voice_query tool replicates the pipeline from voxagent: capture audio, transcribe locally with whisper.cpp, and send to a local Ollama LLM. Fully offline, nothing leaves your machine.

Whisper Model Setup

The voice_query tool requires a Whisper GGML model file. Download one:

Linux / macOS:

mkdir -p ~/.mcp-listen/models
curl -L -o ~/.mcp-listen/models/ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

Windows (PowerShell):

mkdir "$env:USERPROFILE\.mcp-listen\models" -Force
Invoke-WebRequest -Uri "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin" -OutFile "$env:USERPROFILE\.mcp-listen\models\ggml-base.en.bin"

The model is ~150MB and downloads once. You can also set the WHISPER_MODEL_PATH environment variable to a custom directory.

Ollama Setup

Install Ollama from https://ollama.com
Pull a model: ollama pull llama3.2
Ensure Ollama is running: ollama serve

Known Limitations

Fixed recording duration. You specify how long to record. There is no "stop when I stop talking" mode yet.
voice_query requires Ollama running. If Ollama isn't running, the tool returns a clear error message.
Whisper model downloads on first use. The first voice_query call requires a pre-downloaded model (~150MB).
No streaming. MCP's request/response pattern means the entire recording is captured, then transcribed, then sent to the LLM. No real-time partial results.
Temp files. capture_audio writes WAV files to the system temp directory. They are not automatically cleaned up. voice_query cleans up after itself.

Troubleshooting

Windows: "Error opening microphone" Windows may block microphone access by default. Go to Settings > Privacy & security > Microphone and ensure microphone access is enabled for desktop apps.

Ollama: "Ollama is not running" Some Ollama installations start as a background service automatically. If you see this error, run ollama serve manually or check that the Ollama service is running.

Whisper: "model not found" The whisper model file must be downloaded before first use. See Whisper Model Setup for instructions.

Powered By

decibri: Cross-platform microphone capture for Node.js
voxagent: Voice-powered terminal agent (inspiration for the voice_query pipeline)

License

Apache-2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
lib		lib
test		test
.gitignore		.gitignore
ATTRIBUTION.md		ATTRIBUTION.md
LICENSE		LICENSE
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
server.json		server.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mcp-listen

Tools

Quick Start

Claude Code

Claude Desktop / ChatGPT Desktop / Cursor / Windsurf / VS Code

Global Install

Requirements

Tool Reference

list_audio_devices

capture_audio

voice_query

How It Works

Whisper Model Setup

Ollama Setup

Known Limitations

Troubleshooting

Powered By

License

About

Uh oh!

Releases 2

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mcp-listen

Tools

Quick Start

Claude Code

Claude Desktop / ChatGPT Desktop / Cursor / Windsurf / VS Code

Global Install

Requirements

Tool Reference

list_audio_devices

capture_audio

voice_query

How It Works

Whisper Model Setup

Ollama Setup

Known Limitations

Troubleshooting

Powered By

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Contributors

Uh oh!

Languages