deterministic-llm-service

Production-grade HTTP gateway for LLM providers with reliability controls

Fastify-based TypeScript service providing a unified interface for OpenAI and Anthropic with:

⚡ Exponential backoff retries
🔌 Circuit breakers (Opossum)
🚦 Rate limiting per IP
🔑 Idempotency key support
📊 Pino structured logging
✅ Zod schema validation
🛡️ Configurable timeouts
📖 OpenAPI 3.0 spec
📈 Prometheus metrics - request latency, error rates, cache hit ratio
🔄 Request coalescing - dedupe concurrent identical requests

Quick Start

# Install
npm install

# Configure
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

# Run
npm run dev

# Test
npm test

Usage

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Idempotency-Key: my-request-123" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Configuration

Env Var	Default	Description
`PORT`	3000	Server port
`OPENAI_API_KEY`	-	OpenAI API key
`ANTHROPIC_API_KEY`	-	Anthropic API key
`DEFAULT_PROVIDER`	openai	Default LLM provider
`RATE_LIMIT_MAX`	100	Requests per window
`RATE_LIMIT_WINDOW_MS`	60000	Window duration
`RETRY_MAX_ATTEMPTS`	3	Max retry attempts
`IDEMPOTENCY_TTL_MS`	3600000	Cache TTL (1hr)

Metrics

Access Prometheus metrics at /metrics:

curl http://localhost:3000/metrics

Available metrics:

llm_gateway_http_request_duration_seconds - HTTP request latency histogram
llm_gateway_http_requests_total - Total HTTP requests counter
llm_gateway_provider_latency_seconds - LLM provider latency
llm_gateway_tokens_total - Token usage by provider/model
llm_gateway_cache_hits_total / cache_misses_total - Cache hit ratio
llm_gateway_circuit_breaker_state - Circuit breaker state gauge
llm_gateway_rate_limit_exceeded_total - Rate limit events

Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   HTTP      │────▶│   Service    │────▶│  Provider   │
│  (Fastify)  │     │   Layer      │     │   Layer     │
└─────────────┘     └──────────────┘     └─────────────┘
       │                   │                     │
       ▼                   ▼                     ▼
   Zod Validation    Idempotency         OpenAI/Anthropic
   Request ID        Rate Limit           SDK Interface
   Logging           Circuit Breaker
                     Retries

API Documentation

See openapi.yaml for the full OpenAPI 3.0 specification.

Endpoints

Method	Path	Description
GET	`/health`	Health check
POST	`/v1/chat/completions`	Create chat completion

Response Headers

All responses include:

X-Request-Id: Unique request identifier for tracing
X-RateLimit-Limit: Requests per rate limit window
X-RateLimit-Remaining: Remaining requests in current window
X-RateLimit-Reset: ISO timestamp when rate limit resets
X-Cached: true if response was served from idempotency cache

Testing

# Run all tests
npm test

# Run with coverage
npm run test:coverage

# Type check
npm run typecheck

# Run specific test suites
npm test -- chaos
npm test -- benchmark

Test Suites

Unit tests: Provider mocks, retry logic, circuit breaker, rate limiter, idempotency cache, schema validation
Integration tests: End-to-end API tests
Chaos tests: Malformed input, rate limiting, idempotency behavior
Benchmarks: Latency percentiles (p50/p95/p99) under load

Reliability Features

Retries

Exponential backoff with configurable max attempts
Smart retry detection (429, 5xx errors)
Configurable initial delay and max delay

Circuit Breaker

Automatic circuit opening after error threshold
Half-open state for recovery testing
Fallback responses when circuit is open

Rate Limiting

Per-IP sliding window rate limiting
Configurable window size and request limit
Standard rate limit headers in responses

Idempotency

Cache responses by idempotency key
1-hour TTL (configurable)
Prevents duplicate processing

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
src		src
tests		tests
thoughts/shared/plans		thoughts/shared/plans
.gitignore		.gitignore
README.md		README.md
openapi.yaml		openapi.yaml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deterministic-llm-service

Quick Start

Usage

Configuration

Metrics

Architecture

API Documentation

Endpoints

Response Headers

Testing

Test Suites

Reliability Features

Retries

Circuit Breaker

Rate Limiting

Idempotency

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

deterministic-llm-service

Quick Start

Usage

Configuration

Metrics

Architecture

API Documentation

Endpoints

Response Headers

Testing

Test Suites

Reliability Features

Retries

Circuit Breaker

Rate Limiting

Idempotency

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages