Skip to content

unifyai/unillm

Repository files navigation

UniLLM

Lightweight LLM access layer with provider normalization, caching, and observability. It uses a unified endpoint format (model@provider) so application code can switch providers without rewriting call sites, while still allowing developers to choose the model provider or compatible endpoint they want to use.

What layer is this?

UniLLM is the model-access layer in the wider Unity stack:

         User (Console/Phone/SMS/Email)
                      │
    ┌─────────────────┴──────────────────┐
    │           Communication            │
    │    (Webhooks, Voice, SMS, Email)   │
    └────┬───────────────────────────────┘
         │
    ┌────┴────┐    ┌─────────┐    ┌─────────┐
    │  Unity  │    │  Unify  │    │Orchestra│
    │ (Brain) │───▶│  (SDK)  │───▶│  (API)  │
    │         │    │         │    │  (DB)   │
    └────┬────┘    └────┬────┘    └────┬────┘
         │              ▲              ▲
         │              │              │
         │    ┌─────────┴─┐       ┌────┴───────┐
         └───▶│  UniLLM   │       │  Console   │
              │ (LLM API) │       │(Interfaces)│
              └───────────┘       └────────────┘

This repo (UniLLM) handles LLM inference for Unity. It normalizes requests across providers (OpenAI, Anthropic, Vertex AI, etc.), provides response caching for test determinism, and can integrate with Unify for logging and billing context.

If you're here from the Unity quickstart, this is the layer that talks to model providers. OpenAI and Anthropic are the simplest documented paths, but the point of UniLLM is that the provider choice is yours, including other supported providers and compatible local endpoints.

Related repositories:

  • Unity — AI assistant brain (primary consumer)
  • Unify — Python SDK for logging and persistence
  • Orchestra — Backend API and database

Installation

pip install unifyai-unillm

Or with uv:

uv add unifyai-unillm

Configuration

API Keys

Set credentials for whichever providers you want to use:

export OPENAI_API_KEY=<your-key>
export ANTHROPIC_API_KEY=<your-key>
# ... other provider keys

You do not need a Unify account for basic inference. UNIFY_KEY only matters for optional logging, credit, and observability features that integrate with the wider Unify stack.

Google Cloud / Vertex AI

For Vertex AI models (Gemini, Claude on Vertex, etc.), authenticate using Google Cloud Application Default Credentials:

# One-time setup: authenticate with your Google Cloud account
gcloud auth application-default login

# Set your GCP project and location
export VERTEXAI_PROJECT=<your-project-id>
export VERTEXAI_LOCATION=<your-location>  # e.g., us-central1, europe-west1

Alternatively, use a service account JSON file:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

Basic Usage

import unillm

# Sync client
client = unillm.Unify("gpt-4o@openai")
response = client.generate(
    messages=[{"role": "user", "content": "Hello!"}]
)

# Async client
async_client = unillm.AsyncUnify("claude-sonnet-4-20250514@anthropic")
response = await async_client.generate(
    messages=[{"role": "user", "content": "Hello!"}]
)

Features

Unified Endpoint Format

All models use a consistent model@provider format:

client = unillm.Unify("gpt-4o@openai")
client = unillm.Unify("claude-sonnet-4-20250514@anthropic")
client = unillm.Unify("gemini-2.0-flash@vertexai")

Provider-Specific Preprocessing

Automatic handling of provider quirks (message format normalization, parameter translation, etc.) before requests are sent.

Response Caching

Built-in caching to avoid redundant LLM calls:

client = unillm.Unify("gpt-4o@openai", cache=True)

# Cache modes
client.generate(..., cache="read")       # Read from cache only if available
client.generate(..., cache="write")      # Write to cache only
client.generate(..., cache="both")       # Read and write
client.generate(..., cache="read-only")  # Must be in cache, else error

Cache Event Capture

Track cache hit/miss status for observability:

from unillm import capture_cache_events

with capture_cache_events() as events:
    client.generate(messages=[...])

print(events[0]["cache_status"])  # "hit" or "miss"

Streaming

client = unillm.Unify("gpt-4o@openai", stream=True)
for chunk in client.generate(messages=[...]):
    print(chunk, end="")

Stateful Conversations

client = unillm.Unify("gpt-4o@openai", stateful=True)
client.generate(user_message="What is 2+2?")
client.generate(user_message="And what is that times 3?")  # Maintains history

Tool Calling

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "parameters": {"type": "object", "properties": {...}}
    }
}]
response = client.generate(messages=[...], tools=tools)

Response Formats

from pydantic import BaseModel

class Answer(BaseModel):
    value: int
    explanation: str

response = client.generate(
    messages=[{"role": "user", "content": "What is 2+2?"}],
    response_format=Answer
)

Observability

Console & File Logging

Terminal and file logging are independently controlled:

# Terminal (console) output (default: true)
export UNILLM_TERMINAL_LOG=true

# File-based traces (independent of terminal)
export UNILLM_LOG_DIR=/path/to/logs

When UNILLM_LOG_DIR is set, structured log files are written:

  • During call (cache enabled): {base}.cache_pending.txt
  • During call (cache disabled): {base}.pending.txt
  • After completion: {base}.cache_hit.txt or {base}.cache_miss.txt (cache enabled), or {base}.txt (cache disabled)

Pending files remain as evidence if an LLM call hangs or crashes.

OpenTelemetry Tracing

# Enable OTel tracing
export UNILLM_OTEL=true

# OTLP endpoint (optional)
export UNILLM_OTEL_ENDPOINT=http://localhost:4317

# File-based span export (optional)
export UNILLM_OTEL_LOG_DIR=/path/to/traces

LLM calls create OTel spans that can be correlated with parent application spans and propagated to child services.

Privacy Note

When UNILLM_LOG_DIR is set, full request and response payloads are written to disk. This includes user messages, tool arguments, and model responses — which may contain PII or sensitive data. Be mindful of this when enabling file logging in production environments.

Supported Providers

  • OpenAI
  • Anthropic
  • Vertex AI (Google)
  • Bedrock (AWS)
  • DeepSeek
  • Groq
  • Mistral
  • Replicate
  • Together AI
  • xAI

Project Structure

unillm/
├── __init__.py              # Public API exports
├── settings.py              # Configuration via env vars
├── helpers.py               # Utility functions
├── costs.py                 # Provider cost computation
├── cost_tracker.py          # Per-call cost event capture
├── tokens.py                # Token counting and context window utilities
├── cache_events.py          # Cache hit/miss event capture
├── llm_events.py            # LLM event hooks for observability
├── limit_hooks.py           # Spending limit check callbacks
├── logger.py                # File logging and OTel tracing
├── clients/                 # LLM client implementations
│   ├── base.py              # Base client class
│   ├── uni_llm.py           # Unify (sync) and AsyncUnify clients
│   ├── provider_preprocessing.py   # Provider-specific request normalization
│   ├── provider_postprocessing.py  # Response validation and retries
│   └── shared_session.py    # Shared aiohttp session management
├── caching/                 # Response caching system
│   ├── base_cache.py        # Abstract cache backend
│   ├── local_cache.py       # File-based NDJSON cache
│   └── local_separate_cache.py  # Split read/write cache (for CI)
├── endpoints/               # Provider-specific model mappings
│   ├── openai.py
│   ├── anthropic.py
│   └── ...
└── types/                   # Type definitions
    ├── cache.py
    ├── prompt.py
    └── prompt_caching.py

Local Development

This project uses uv for dependency management. By default, local development installs the published unifyai package for the persistence/logging dependency.

Setup

git clone https://github.com/unifyai/unillm.git
cd unillm
uv sync

If you're iterating on a sibling checkout of unify as well, override the installed dependency with an editable install after uv sync:

uv pip install -e ../unify

Running Tests

Tests require at minimum an OPENAI_API_KEY or ANTHROPIC_API_KEY set in your environment (or a .env file). With a populated .cache.ndjson, cached LLM responses are replayed — so tests run fast and deterministically without making real LLM calls.

uv run pytest tests/ -v

UNIFY_KEY is optional. If set, credit deduction runs against the Unify API. If unset, credit deduction silently warns and tests continue normally.

Running Tests in CI

Tests are opt-in to reduce GitHub Actions costs. Tests only run when explicitly requested:

  • Commit message: Include [run-tests] in your commit message
  • PR title: Include [run-tests] in your pull request title
  • Manual trigger: Use the "Run workflow" button in GitHub Actions

Examples:

# Run tests on this commit
git commit -m "Fix caching logic [run-tests]"

# No tests (default)
git commit -m "Update README"

Note: The black formatting check always runs on every push.

Some CI steps (local Orchestra deployment, GCP authentication) are internal infrastructure for the Unify team and are automatically skipped on external forks.

Pre-commit Hooks

Pre-commit hooks run automatically on git commit (Black, isort, autoflake). If a commit fails due to auto-formatting, re-run the commit.

pre-commit install

License

MIT — see LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors