realtime-stt-ws

Real-time speech-to-text WebSocket server with pluggable ASR backends.

Architecture

WebSocket Client
      │
      ▼
┌─────────────────────────────────────┐
│  FastAPI (async)                    │
│                                     │
│  _recv_pump ──► in_queue            │
│                    │                │
│              run_worker (thread)    │
│                    │                │
│  _send_pump ◄── out_queue           │
└─────────────────────────────────────┘
      │
      ▼
  ASR Backend (Qwen3 / Mock)

Async I/O for WebSocket recv/send via FastAPI
Sync worker thread for inference (bridged with janus queues)
Pluggable backends via ASRBackend protocol
Energy-based VAD for utterance endpoint detection (numpy-based RMS)
Prometheus metrics at /metrics and health check at /health
Adaptive polling (50ms active, 200ms idle) for CPU efficiency
Partial backpressure (drops partials when output queue >80%)
Drop-oldest overflow handling for queue overflow protection

Backends

Backend	Description	Requires
`qwen3`	Qwen3-ASR-1.7B via embedded vLLM (in-process GPU)	CUDA + `vllm[audio]`
`mock`	Fixture-based replay for testing	Nothing extra

Set via STT_BACKEND environment variable (default: qwen3).

Backend Roadmap

Phase	Backend	Model	Status
Phase 1	`qwen3`	Qwen3-ASR-1.7B (vLLM)	Done
Phase 2	`whisper`	Whisper	Planned
Phase 3	`nemo`	NVIDIA NeMo	Planned

Setup

# Install core + dev dependencies
uv sync --group dev

# With Qwen3 backend (requires CUDA)
uv sync --group dev --extra qwen3

Usage

# Run with mock backend (no GPU needed)
STT_BACKEND=mock uvicorn app:app --host 0.0.0.0 --port 8000

# Run with Qwen3 backend (requires CUDA)
STT_BACKEND=qwen3 uvicorn app:app --host 0.0.0.0 --port 8000

WebSocket Protocol

Connect to ws://host:port/v1/stt/ws with query parameters:

Parameter	Required	Default	Description
`call_id`	yes	-	Unique call identifier
`sample_rate`	yes	-	Must be `16000`
`codec`	no	`pcm_s16le`	Audio codec
`channels`	no	`1`	Must be mono
`frame_ms`	no	`20`	Frame duration in ms
`partial_ms`	no	`500`	Partial result interval in ms
`language`	no	`auto`	BCP-47 code (e.g. `zh`, `en`, `ja`)
`hotwords`	no	-	Comma-separated hotwords

Send: binary PCM16LE audio frames, or JSON {"type": "commit"} to force-finalize.

Receive: JSON messages:

HTTP Interface (Offline)

For batch transcription, use POST /v1/stt/transcribe:

# Upload a WAV file for transcription
curl -X POST http://localhost:8000/v1/stt/transcribe \
  -F "file=@audio.wav" \
  -G -d "language=zh"

# Response:
# {
#   "text": "转写文本内容",
#   "duration_ms": 3200.0,
#   "latency_ms": 450.3
# }

Supported audio formats: WAV (PCM 16/32-bit, IEEE Float 32/64-bit). Audio is automatically converted to 16kHz mono.

// Connection ready
{"type": "ready", "call_id": "...", "sample_rate": 16000, ...}

// Partial transcription
{"type": "partial", "call_id": "...", "segment_seq": 0, "text": "hello", "final": false}

// Final transcription (after endpoint or commit)
{"type": "final", "call_id": "...", "segment_seq": 0, "text": "hello world", "final": true,
 "first_token_ms": 120, "latency_ms": 250, "segment_duration_ms": 3200}

// Error
{"type": "error", "call_id": "...", "code": 1008, "message": "..."}

Configuration

All settings use the STT_ env prefix:

Variable	Default	Description
`STT_BACKEND`	`qwen3`	Backend name (`qwen3` or `mock`)
`STT_HOST`	`0.0.0.0`	Server bind host
`STT_PORT`	`8000`	Server bind port
`STT_IN_QUEUE_SIZE`	`200`	Input queue capacity
`STT_IN_QUEUE_MAX_DROPS`	`500`	Max input queue drops before error
`STT_OUT_QUEUE_SIZE`	`50`	Output queue capacity
`STT_MAX_UPLOAD_BYTES`	`104857600`	Max upload size (100MB)
`STT_VLLM_MODEL`	`Qwen/Qwen3-ASR-1.7B`	vLLM model name
`STT_VLLM_GPU_MEMORY`	`0.7`	GPU memory utilization (0-1)
`STT_VLLM_MAX_TOKENS`	`512`	Max output tokens
`STT_VLLM_MAX_INFERENCE_BATCH_SIZE`	`32`	Max batch size for inference
`STT_VAD_THRESHOLD`	`300`	RMS energy threshold for speech
`STT_VAD_SILENCE_MS`	`500`	Silence duration to trigger endpoint
`STT_STREAMING_CHUNK_SIZE_SEC`	`0.5`	Streaming chunk size in seconds

Testing

# Run mock tests (no GPU needed)
STT_BACKEND=mock pytest tests/mock/ -v

# Run qwen3 tests (requires CUDA)
STT_BACKEND=qwen3 pytest tests/qwen3/ -v

Project Structure

├── app.py                  # FastAPI entry point
├── config.py               # Pydantic settings
├── backends/
│   ├── base.py             # ASRBackend protocol + ASRResult
│   ├── registry.py         # Backend factory
│   ├── mock.py             # Fixture-based mock backend
│   └── qwen3.py            # vLLM Qwen3 backend
├── inference/
│   └── worker.py           # Sync inference worker thread
├── protocol/
│   └── schema.py           # Message schemas & validation
├── session/
│   └── state.py            # Per-connection session state
├── transport/
│   └── ws.py               # WebSocket router & pumps
├── observability/
│   ├── logging.py          # Structured logging (structlog)
│   └── metrics.py          # Prometheus metrics
├── references/             # Test fixtures (audio + text)
└── tests/                  # Unit & integration tests

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backends		backends
docs		docs
inference		inference
observability		observability
protocol		protocol
references		references
session		session
tests		tests
transport		transport
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.py		config.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

realtime-stt-ws

Architecture

Backends

Backend Roadmap

Setup

Usage

WebSocket Protocol

HTTP Interface (Offline)

Configuration

Testing

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

realtime-stt-ws

Architecture

Backends

Backend Roadmap

Setup

Usage

WebSocket Protocol

HTTP Interface (Offline)

Configuration

Testing

Project Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages