Skip to content

Add on-device LLM post-processing for transcriptions#106

Open
userFRM wants to merge 1 commit intoStarmel:masterfrom
userFRM:feat/llm-post-processing
Open

Add on-device LLM post-processing for transcriptions#106
userFRM wants to merge 1 commit intoStarmel:masterfrom
userFRM:feat/llm-post-processing

Conversation

@userFRM
Copy link

@userFRM userFRM commented Feb 16, 2026

Summary

  • Adds manual per-recording LLM text processing using Apple MLX framework for on-device inference (no cloud API calls)
  • Users can clean up filler words, restructure as developer prompts, or format as structured markdown via a wand menu on each recording
  • New LLM module with model lifecycle management, curated model registry (0.5B–4B), and conditional compilation for non-MLX builds

Details

New files

  • OpenSuperWhisper/LLM/LLMPostProcessor.swift — Singleton managing MLX model lifecycle with generation counter for async race protection, task coalescing for duplicate loads, and Task.detached for off-main-thread inference
  • OpenSuperWhisper/LLM/LLMProcessingMode.swift — Processing mode enum (raw/clean/dev/markdown) with system prompts
  • OpenSuperWhisper/LLM/LLMModelRegistry.swift — Curated registry of 8 MLX models (5 general purpose, 3 code-specialized)
  • OpenSuperWhisper/LLM/DevModePrompts.swift — System prompt for converting spoken developer dictation to structured markdown

Modified files

  • ContentView.swift — Per-recording LLM processing via wand icon context menu, shimmer overlay during processing, revert-to-raw support
  • Settings.swift@MainActor on SettingsViewModel, Speech/LLM sub-tabs, model install flow with cancel button and rollback on failure/cancel (race-safe via model-id guards)
  • Recording.swiftrawTranscription field, DB migration v3 with backfill
  • IndicatorWindow.swiftrawTranscription set on hotkey recordings, removed dead .processing state
  • TranscriptionQueue.swift — Passes rawTranscription through on transcription completion
  • AppPreferences.swift — LLM preferences (processing mode, model ID, temperature, installed model IDs)
  • PermissionsManager.swift — Use AXIsProcessTrustedWithOptions instead of AXIsProcessTrusted()
  • run.sh — Dynamic libomp discovery, Metal Toolchain checks
  • project.pbxproj — MLX framework references, mlx-swift-lm pinned to upToNextMajorVersion: 2.30.3

Test plan

  • Build with MLX enabled — verify LLM sub-tab appears, models can be installed/cancelled
  • Build without MLX — verify app compiles, LLM UI shows "unavailable" messaging
  • Record via main window — verify rawTranscription is set, wand menu works (clean/dev/markdown)
  • Record via hotkey indicator — verify rawTranscription is set, revert-to-raw works
  • Cancel model install mid-download — verify previous model is restored, no error toast
  • Cancel then immediately install another model — verify no state clobbering (race scenario)
  • Process text then revert to raw — verify original transcription is restored
  • Drop audio file for transcription — verify rawTranscription is set on completion
  • Run ./run.sh on both Intel and Apple Silicon Macs

🤖 Generated with Claude Code

Adds manual per-recording LLM text processing using Apple MLX framework
for on-device inference. Users can clean up filler words, restructure
transcriptions as developer prompts, or format as structured markdown
via a wand menu on each recording.

Key changes:
- New LLM/ module: LLMPostProcessor (model lifecycle with generation
  counter for race protection), LLMProcessingMode (clean/dev/markdown),
  LLMModelRegistry (curated models 0.5B-4B), DevModePrompts
- Per-recording processing with rawTranscription for revert-to-raw
- Settings: Speech/LLM sub-tabs, model install with cancel + rollback
- DB migration v3 adding rawTranscription column with backfill
- Conditional compilation (#if canImport(MLX)) for non-MLX builds
- Build system: dynamic libomp discovery, Metal Toolchain checks
- mlx-swift-lm pinned to semver 2.30.3 (was branch: main)
- CI pinned to Xcode 16.2 (workaround for ml-explore/mlx-swift-lm#94)
- Fix PermissionsManager to use AXIsProcessTrustedWithOptions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant