Hold a key. Speak. Text appears wherever you're typing — with your project's vocabulary intact.
yap is a lightweight voice-to-text daemon that reads your project docs and conversation history so domain terms transcribe correctly. Local whisper.cpp by default, terminal-native, near-zero idle footprint.
Status: pre-1.0. Linux is the current platform. macOS and Windows are on the roadmap. Breaking changes are possible until 1.0.
- Context-aware transcription — yap reads your project docs (CLAUDE.md, README.md) and conversation history (Claude Code sessions, tmux scrollback) to bias Whisper's vocabulary. Domain terms transcribe correctly instead of being garbled.
- Terminal-native — OSC52, bracketed paste, tmux, SSH, Claude Code, VS Code, and browsers are first-class injection targets, not afterthoughts.
- Local-first — whisper.cpp runs on your machine by default. Audio never leaves your hardware. Swap to Groq for ~1s cloud latency with one line of config.
- Zero idle footprint — a few MB of RAM and negligible CPU when not recording.
- Single binary — one static executable, no runtime dependencies.
- Composable primitives —
Record,Transcribe,Transform,Injectare independent operations exposed through a public Go library. - Clipboard safe — your clipboard is preserved across paste operations that need it.
For the full architectural picture, see ARCHITECTURE.md. For where the project is going, see ROADMAP.md.
curl -fsSL https://yap.enriquefft.com/install | bashGrab the latest binary from GitHub Releases and put it on your PATH.
git clone https://github.com/Enriquefft/yap.git
cd yap
make build # dynamic build
make build-static # static binary (musl)
make testThe first time you launch the daemon, yap runs an interactive wizard to set up your hotkey, transcription backend, and language preference.
yap listenThe wizard will ask for:
- Your transcription backend. The default is whisperlocal (local whisper.cpp via the
whisper-serversubprocess); groq is available as a remote fallback. - The API key for the chosen backend, if it's a remote one.
- Your preferred hotkey (default: Right Ctrl).
- Your language preference (default: auto-detect).
If you pick the local backend on first run, download the model with:
yap models download base.enThe model file is ~150 MB. The daemon refuses to start with the local backend until the file is in the cache (or transcription.model_path points at one).
Once it's running:
- Hold the hotkey — a chime confirms recording started.
- Speak — talk naturally.
- Release the hotkey — recording stops, a chime confirms, and the transcribed text is injected at your cursor.
yap listen # start the daemon (background, hotkey-driven)
yap listen --foreground # foreground mode (systemd, launchd, containers)
yap stop # stop the daemon (idempotent)
yap status # daemon state, mode, version, backend (JSON)
yap toggle # toggle recording from a script or external keybind
yap config get <key> # dot-notation: yap config get transcription.backend
yap config set <key> <value> # dot-notation
yap config path # print resolved config file path
yap models list # list pinned whisper.cpp models and install state
yap models download <name> # download a model into the cache (SHA256-verified)
yap models path [name] # print the cache directory or a specific model path
yap record # one-shot record → transcribe → inject, no daemon
yap transcribe <file.wav> # transcribe an audio file (--json for structured output)
yap transform "text" # run the LLM transform on text (stdin or arg)
yap paste "text" # exercise the injection layer directly
yap devices # list audio input devices
yap init # generate .yap.toml with project vocabulary terms
yap init --ai --backend claude # LLM-extracted terms via Claude Code (zero config)
yap hint # debug: show resolved target, provider, and vocabulary
yap resolve # debug: show injection strategy resolution
yap config overrides # show per-project .yap.toml overridesConfig file location: $XDG_CONFIG_HOME/yap/config.toml (typically ~/.config/yap/config.toml).
The schema is nested. Each section has its own well-defined surface:
[general]
hotkey = "KEY_RIGHTCTRL" # single key today; combos land in a later phase
mode = "hold" # "hold" | "toggle"
max_duration = 60 # max recording seconds
audio_feedback = true # chime on start/stop
audio_device = "" # empty = system default
silence_detection = false
silence_threshold = 0.02
silence_duration = 2.0
history = false
stream_partials = true
[transcription]
backend = "whisperlocal" # default. Set to "groq" or "openai" for a remote backend.
model = "base.en" # whisperlocal: base.en (only pinned). remote: backend-specific.
model_path = "" # explicit local model path; empty auto-resolves from the cache
whisper_server_path = "" # explicit whisper-server binary; empty resolves via PATH
language = "" # empty = auto-detect
api_url = "" # empty = backend default (remote backends only)
api_key = "" # or env: YAP_API_KEY / GROQ_API_KEY
[transform]
enabled = false
backend = "passthrough" # "passthrough" | "local" | "openai"
model = ""
system_prompt = "Fix transcription errors and punctuation. Do not rephrase. Preserve original language. Output only corrected text."
api_url = ""
api_key = "" # or env: YAP_TRANSFORM_API_KEY
[injection]
prefer_osc52 = true # OSC52 for terminals when supported
bracketed_paste = true # wrap multi-line text for shells
electron_strategy = "clipboard" # "clipboard" | "keystroke"
default_strategy = "" # empty = auto-detect; or force "tmux"/"osc52"/"wayland"/"x11"
# app_overrides = [
# { match = "firefox", strategy = "clipboard" },
# { match = "kitty", strategy = "osc52" },
# ]
[hint]
enabled = true # context-aware transcription (reads project docs + app state)
vocabulary_files = ["CLAUDE.md", "AGENTS.md", "README.md"] # project docs to read for domain terms
providers = ["claudecode", "termscroll"] # conversation context providers, first-match wins
vocabulary_max_chars = 250 # Whisper prompt budget
conversation_max_chars = 8000 # transform context budget
timeout_ms = 300 # max wall-time for provider fetch
[audio]
high_pass_filter = true # remove sub-speech rumble (<80Hz)
high_pass_cutoff = 80 # Hz; speech fundamentals start at ~85Hz
trim_silence = true # trim leading/trailing silence (prevents Whisper hallucinations)
trim_threshold = 0.01 # RMS amplitude threshold for silence detection
trim_margin_ms = 200 # ms of silence to keep around speech
[tray]
enabled = falseThe TOML schema, the NixOS module, the wizard prompts, and the validation logic all generate from the same Go types in pkg/yap/config/. There is no hand-maintained drift across surfaces.
| Variable | Effect |
|---|---|
YAP_API_KEY |
Sets transcription.api_key. Primary, used by remote backends. |
GROQ_API_KEY |
Compat alias for YAP_API_KEY. |
YAP_TRANSFORM_API_KEY |
Sets transform.api_key. |
YAP_HOTKEY |
Compat alias for general.hotkey. |
YAP_CONFIG |
Override config file path. |
yap reads project docs and application state to bias Whisper toward your domain vocabulary. This fixes common misrecognitions — "yap" no longer transcribes as "jump" or "chap" when you're working in the yap repo.
Two layers, both configurable in ~/.config/yap/config.toml:
| Layer | Source | Feeds | Always-on? |
|---|---|---|---|
| Vocabulary | Project docs (CLAUDE.md, AGENTS.md, README.md) walked from cwd to git root |
Whisper prompt parameter (lexical bias) |
Yes (when hint.enabled) |
| Conversation | App-specific state (Claude Code session, terminal scrollback) | LLM transform context (intent grounding) | Only when a provider matches |
Providers supply conversation context. They run in priority order; first match wins:
| Provider | Matches | Source |
|---|---|---|
claudecode |
Terminal apps | ~/.claude/projects/<cwd-slug>/<latest>.jsonl — recent user/assistant messages |
termscroll |
Terminal apps | Terminal scrollback via API (kitty allow_remote_control; wezterm, ghostty, tmux coming soon) |
All [hint] options (in ~/.config/yap/config.toml):
[hint]
enabled = true # master switch
vocabulary_files = ["CLAUDE.md", "AGENTS.md", "README.md"] # project doc filenames, walked from cwd to git root
providers = ["claudecode", "termscroll"] # conversation providers, first-match wins
vocabulary_max_chars = 250 # Whisper prompt budget
conversation_max_chars = 8000 # transform context budget
timeout_ms = 300 # max ms for provider fetch before recording startsPer-project config: run yap init to generate a .yap.toml in your repo root with extracted domain vocabulary:
yap init # heuristic term extraction from project docs
yap init --ai --backend claude # LLM-extracted terms via Claude Code CLI (zero config)The generated file overrides global hint settings for that project:
# .yap.toml (in repo root, committed or gitignored — your call)
vocabulary_terms = ["yap", "whisperlocal", "OSC52", "malgo", "biquad"]
vocabulary_files = ["CLAUDE.md", "AGENTS.md", "README.md"]You can also override any [hint] field. Any field not set keeps its global default. yap walks from cwd to the nearest .git root to find this file.
Disable: yap config set hint.enabled false (global) or set enabled = false in .yap.toml (per-project)
Debug: yap hint prints the resolved target, winning provider, and vocabulary/conversation sizes for the currently focused window.
yap is local-first by default. The default transcription backend is whisperlocal, which runs whisper.cpp on your machine via a long-lived whisper-server subprocess. With the local backend, your audio never leaves your machine.
The model file (currently base.en, ~150 MB) is downloaded once into $XDG_CACHE_HOME/yap/models/ from Hugging Face, with a SHA256 verified against the pinned manifest in pkg/yap/transcribe/whisperlocal/models/manifest.go. After the download, no further network calls are made for transcription.
Remote backends are available as a swap. Groq is the supported remote: set transcription.backend = "groq" and provide YAP_API_KEY. Any other OpenAI-compatible endpoint works via transcription.backend = "openai" (or "custom") plus transcription.api_url.
yap has no accounts, no telemetry, no analytics, and no cloud sync. Ever.
| Platform | Status |
|---|---|
| Linux | Supported. |
| macOS | Coming soon (ROADMAP.md phase 15). |
| Windows | Planned (ROADMAP.md phase 16). |
Issues and PRs are welcome. See ROADMAP.md for the phased plan and ARCHITECTURE.md for the design contract any contribution should respect.
MIT.