A Swift package providing local, multi-speaker text-to-speech via a typed Swift API and a JSONL worker surface. Includes voice creation by design or clone, as well as custom text normalization via TextForSpeech
SpeakSwiftly is a TTS-in-a-box solution for Swift app devs. It ships both an importable library product, and a worker executable, The library gives Swift callers a typed runtime surface, while the executable gives non-Swift hosts (Python, Rust, etc.) a newline-delimited JSON protocol over stdio.
This project was born from my desire for a simple, "plug-and-play" TTS option for other things I'm building. It's rapidly turned into something I think others will find useful as well.
SpeakSwiftly currently supports:
- Typed Swift runtime APIs through
SpeakSwiftlyCore - A long-lived JSONL worker executable for non-Swift callers
- Stored voice profiles and text-normalization profiles
- Resident backend switching between
qwen3andmarvis - Resident model unload and reload controls
- Managed generated-file and generated-batch artifacts
For deeper contributor-facing architecture notes, runtime behavior details, development guidance, and full verification workflows, see CONTRIBUTING.md.
SpeakSwiftly is a standard Swift package that depends on:
Library consumers can add the package directly from GitHub:
.package(url: "https://github.com/gaelic-ghost/SpeakSwiftly.git", from: "0.9.2")Then add SpeakSwiftlyCore to the target that will own the runtime.
SpeakSwiftlyCore also carries a vendored mlx-swift_Cmlx.bundle resource so linked consumers can resolve the packaged MLX shader bundle and bundled default.metallib without spelunking through DerivedData.
For package-local validation:
swift buildFor real MLX-backed local worker runs, publish the Xcode-backed runtime first:
sh scripts/repo-maintenance/publish-runtime.sh --configuration DebugThat produces stable local runtime launchers under .local/xcode/current-debug and .local/xcode/current-release.
import SpeakSwiftlyCore
import TextForSpeech
let runtime = await SpeakSwiftly.liftoff()
await runtime.start()
let handle = await runtime.generate.speech(
text: "Hello there.",
with: "default-femme"
)
for try await event in handle.events {
print(event)
}When the whole input is source code rather than prose with embedded code, use sourceFormat:
let sourceHandle = await runtime.generate.speech(
text: "struct WorkerRuntime { let sampleRate: Int }",
with: "default-femme",
sourceFormat: .swift
)The typed runtime is organized around stored concern handles that callers can keep and reuse:
runtime.generateruntime.playerruntime.voicesruntime.normalizerruntime.jobsruntime.artifacts
When callers need to construct a standalone text normalizer, SpeakSwiftly.Normalizer(...) now throws if the persisted text-profile archive cannot be loaded or decoded. The worker runtime still uses a best-effort recovery path for unreadable archives so SpeakSwiftly.liftoff() can continue starting in operator-facing environments.
Runtime preferences have a matching typed surface:
import SpeakSwiftlyCore
let configuration = SpeakSwiftly.Configuration(speechBackend: .marvis)
try configuration.save(to: URL(fileURLWithPath: "/tmp/speakswiftly-configuration.json"))
let runtime = await SpeakSwiftly.liftoff(configuration: configuration)If a host needs the packaged MLX bundle or the exact metallib path, use the support-resource surface:
let mlxBundleURL = try SpeakSwiftly.SupportResources.mlxBundleURL()
let defaultMetallibURL = try SpeakSwiftly.SupportResources.defaultMetallibURL()Launch the published runtime through the stable launcher:
sh scripts/repo-maintenance/publish-runtime.sh --configuration Debug
"$PWD/.local/xcode/current-debug/run-speakswiftly"At startup the worker begins preloading the resident model and emits JSONL status events on stdout.
The package also ships a small executable consumer harness, SpeakSwiftlyTesting, for package-level smoke tests:
swift run SpeakSwiftlyTesting resources
swift run SpeakSwiftlyTesting status
swift run SpeakSwiftlyTesting smokeresources prints the packaged bundle and metallib paths, status constructs the typed runtime and prints the first terminal status payload it sees, and smoke runs both checks in sequence.
The package currently publishes:
SpeakSwiftlyCoreas the typed Swift runtime librarySpeakSwiftlyas the worker executable
Key typed runtime entry points include:
runtime.generate.speech(text:with:textProfileName:textContext:sourceFormat:)runtime.generate.audio(text:with:textProfileName:textContext:sourceFormat:)runtime.generate.batch(_:with:)runtime.voices.create(design named:from:vibe:voice:outputPath:)runtime.voices.create(clone named:from:vibe:transcript:)runtime.voices.list()runtime.voices.delete(named:)runtime.player.list()runtime.player.pause()runtime.player.resume()runtime.player.state()runtime.player.clearQueue()runtime.player.cancelRequest(_:)runtime.jobs.expire(id:)runtime.jobs.generationQueue()runtime.jobs.job(id:)runtime.jobs.list()runtime.artifacts.file(id:)runtime.artifacts.files()runtime.artifacts.batch(id:)runtime.artifacts.batches()SpeakSwiftly.SupportResources.bundleSpeakSwiftly.SupportResources.mlxBundleURL()SpeakSwiftly.SupportResources.defaultMetallibURL()runtime.status()runtime.switchSpeechBackend(to:)runtime.reloadModels()runtime.unloadModels()
The typed Swift library and the JSONL worker surface intentionally use different naming styles:
- Swift keeps Cocoa-style method names that read naturally at the call site.
- JSONL keeps snake_case, verb-first operation names.
- JSONL read-one operations use
get_*. - JSONL collection and queue reads use
list_*. - JSONL CRUD-style writes use
create_*,replace_*, anddelete_*. - JSONL lifecycle and control operations keep literal verbs like
generate_*,set_*,reload_*,unload_*,pause,resume,clear_*,cancel_*,load_*,save_*, andreset_*when the operation is not best modeled as CRUD.
Resident runtime controls currently map like this:
| Typed Swift API | JSONL op |
Notes |
|---|---|---|
status(id:) |
"get_status" |
Returns the current stage, resident_state, and speech_backend. |
switchSpeechBackend(to:id:) |
"set_speech_backend" |
Requires a "speech_backend" field on the JSONL request. |
reloadModels(id:) |
"reload_models" |
Re-warms the currently selected resident backend. |
unloadModels(id:) |
"unload_models" |
Drops resident models from memory and parks later resident-dependent generation until residency returns. |
The worker protocol is newline-delimited JSON over standard input and output.
Representative request shapes:
{"id":"req-1","op":"generate_speech","text":"Hello there","profile_name":"default-femme"}
{"id":"req-1f","op":"generate_audio_file","text":"Save this one for later playback.","profile_name":"default-femme"}
{"id":"req-batch","op":"generate_batch","profile_name":"default-femme","items":[{"text":"First saved file."},{"artifact_id":"custom-batch-artifact","text":"Second saved file.","text_profile_name":"logs"}]}
{"id":"req-text-style","op":"get_text_profile_style"}
{"id":"req-set-text-style","op":"set_text_profile_style","text_profile_style":"compact"}
{"id":"req-status","op":"get_status"}
{"id":"req-generated-file","op":"get_generated_file","artifact_id":"req-1f-artifact-1"}
{"id":"req-generated-files","op":"list_generated_files"}
{"id":"req-switch","op":"set_speech_backend","speech_backend":"marvis"}
{"id":"req-reload","op":"reload_models"}
{"id":"req-unload","op":"unload_models"}Representative response and event shapes:
{"event":"worker_status","stage":"warming_resident_model","resident_state":"warming","speech_backend":"qwen3"}
{"event":"worker_status","stage":"resident_model_ready","resident_state":"ready","speech_backend":"qwen3"}
{"id":"req-unload","ok":true,"status":{"event":"worker_status","stage":"resident_models_unloaded","resident_state":"unloaded","speech_backend":"qwen3"},"speech_backend":"qwen3"}
{"id":"req-after-unload","event":"queued","reason":"waiting_for_resident_models","queue_position":1}
{"id":"req-reload","ok":true,"status":{"event":"worker_status","stage":"resident_model_ready","resident_state":"ready","speech_backend":"qwen3"},"speech_backend":"qwen3"}Raw JSONL callers should send absolute filesystem paths for path fields, or include cwd when using relative paths. The typed Swift helpers populate caller working-directory context automatically.
For the full wire examples, detailed event flow, and operator-facing behavior notes, see CONTRIBUTING.md.
Use this repository as the primary development home for SpeakSwiftly. Keep the public README focused on product and usage information, and put contributor-facing architecture notes, repository workflow, and deep operational guidance in CONTRIBUTING.md.
For package-focused development, prefer:
swift build
swift testFor real runtime verification and published local worker workflows, use the scripts under scripts/repo-maintenance/ as described in CONTRIBUTING.md.
Baseline package verification:
swift build
swift testReal MLX-backed runtime verification starts by publishing the Xcode-backed runtime:
sh scripts/repo-maintenance/publish-runtime.sh --configuration Debug
sh scripts/repo-maintenance/verify-runtime.sh --configuration DebugExtended e2e, trace-capture, and deep-trace workflows are documented in CONTRIBUTING.md.