Skip to content

Dragonav4/SpeakIDE

Repository files navigation

SpeakIDE

Voice dictation for IntelliJ IDEA — press a shortcut, speak, and your words appear wherever the cursor is: code editor, AI Chat, Junie, commit message, any text field.


Demo

Demo Video


Features

  • Universal input — works in any text field: code editor, AI Chat / Junie, commit message, search boxes
  • Three STT backends — switch between cloud and offline in settings:
    • OpenAI Whisper (cloud) — high accuracy, supports any OpenAI-compatible endpoint (OpenAI, Groq, etc.)
    • Whisper Local (offline) — runs a ggml Whisper model on-device via whisper.cpp JNI, no internet required
    • Vosk (offline) — fully local, lightweight, no internet required, no data leaves your machine
  • Auto-stop on silence — recording ends automatically after a configurable pause; no need to press the shortcut again
  • Clipboard fallback — if no editor is focused when recording stops, the transcribed text is copied to clipboard
  • Microphone button in AI Chat toolbar — one-click dictation directly from the AI Assistant / Junie input bar
  • Configurable language — set a language hint or leave on auto for automatic detection
  • Secure key storage — API key stored in IntelliJ's PasswordSafe, never in plain text on disk

Installation

From JetBrains Marketplace (recommended)

  1. Open Settings → Plugins → Marketplace
  2. Search for SpeakIDE
  3. Click Install and restart the IDE

Or install directly: plugins.jetbrains.com/plugin/31433-speakide

From source

Requirements: JDK 21, IntelliJ IDEA 2024.2+

git clone https://github.com/Dragonav4/SpeakIDE.git
cd SpeakIDE
./gradlew runIde          # launch a sandbox IDE with the plugin
./gradlew buildPlugin     # build distributable .zip → build/distributions/

Install the .zip via Settings → Plugins → ⚙ → Install Plugin from Disk.


Setup

OpenAI Whisper (cloud)

  1. Open Settings → Tools → SpeakIDE
  2. Select Provider → OpenAI Whisper (Cloud)
  3. Paste your API key (stored securely)
  4. Set Base URL and Model Name:
Provider Base URL Model
OpenAI https://api.openai.com/v1 whisper-1
Groq https://api.groq.com/openai/v1 whisper-large-v3-turbo

Vosk (offline)

  1. Download a model from alphacephei.com/vosk/models
    Recommended: vosk-model-small-en-us (~50 MB) or vosk-model-small-ru
  2. Extract the archive
  3. Open Settings → Tools → SpeakIDE
  4. Select Provider → Vosk (Offline)
  5. Set Vosk model path to the extracted folder

Whisper Local (offline)

  1. Download a ggml model from Hugging Face
    Recommended: ggml-base.en.bin (~150 MB) or ggml-small.bin for multilingual
  2. Open Settings → Tools → SpeakIDE
  3. Select Provider → Whisper (Local)
  4. Set Whisper local model (.bin) to the downloaded .bin file

Note for macOS Apple Silicon users: the bundled native library supports both arm64 and x86_64 — no extra steps needed.

Minimum audio length: Whisper Local requires at least ~1 second of speech to produce a result. Very short recordings (under 1 s) will return empty output.


Usage

Action How
Start / stop recording Ctrl+Alt+/ (all platforms)
Start / stop from AI Chat Click the 🎤 button in the AI Chat input toolbar
Auto-stop Speak, then pause — recording stops after silence timeout
No editor focused Transcribed text is copied to clipboard automatically

The microphone icon in the toolbar turns red while recording is active.


Settings Reference

Settings → Tools → SpeakIDE

Setting Description
Provider OpenAI Whisper (Cloud), Vosk (Offline), or Whisper (Local)
Language Language hint (auto, en, ru, de, …) — Vosk ignores this
API Key Whisper cloud API key — stored in PasswordSafe
Base URL Whisper-compatible endpoint URL
Model Name Model identifier for the cloud API
Vosk model path Path to the extracted Vosk model folder
Whisper local model (.bin) Path to a ggml .bin model file
Enable silence detection Auto-stop recording after a pause
Silence threshold (ms) How long the pause must be before stopping (default 2000 ms)
Show recording overlay Visual indicator while recording

Architecture

ToggleRecordingAction  (~30 lines, delegates only)
        │
        ▼
RecordingService  (@Service APP — owns the state machine)
        │
        ├── RecordingState (sealed class: Idle / Recording / Transcribing)
        │
        ├── AudioSource (interface)
        │       └── AudioCapture (TarsosDSP)
        │               ├── MicPermissionProbe
        │               └── SilenceTimeoutProcessor
        │
        ├── PcmBuffer  (ConcurrentLinkedQueue — thread-safe PCM accumulator)
        │
        ├── SttProviderFactory
        │       ├── OpenAiWhisperProvider  ── shared HttpClient (lazy) ── Whisper API
        │       ├── WhisperLocalProvider   ── whisper.cpp JNI ── ggml model (offline)
        │       │       └── AudioResampler  (44 100 Hz → 16 kHz, TarsosDSP sinc)
        │       └── VoskProvider  ── Vosk JNI (native arm64 / x86_64)
        │
        ├── TextDelivery (fun interface)
        │       ├── CaretTextDelivery   (WriteCommandAction on EDT)
        │       └── ClipboardTextDelivery  (fallback when no editor focused)
        │
        ├── RecordingIndicator (interface)
        │       └── RecordingOverlay  (floating Swing window)
        │
        └── SpeakIdeNotifier  (wraps NotificationGroupManager)

The AI Chat microphone button is a second registration of ToggleRecordingAction added to the AI Assistant toolbar — loaded only when the com.intellij.ml.llm plugin is present (optional dependency).


Contributing

Pull requests are welcome. For major changes, open an issue first.

./gradlew test            # run unit tests
./gradlew runIde          # test in sandbox IDE

License

APACHE 2.0

About

SpeakIDE is a powerful voice-to-text plugin that boosts your coding productivity. Dictate code, write documentation, and interact with the IDE seamlessly using your voice, powered by OpenAI Whisper and Vosk.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages