Quick Start | Documentation | Colab Notebook | Discord
Llama Stack defines and standardizes the core building blocks that simplify AI application development. It provides a unified set of APIs with implementations from leading service providers. Get started instantly:
curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash- Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals.
- Plugin architecture supporting local development, on-premises, cloud, and mobile environments.
- Prepackaged verified distributions for a one-stop solution in any environment.
- Multiple developer interfaces — CLI and SDKs for Python, Typescript, iOS, and Android.
- Standalone applications as examples for production-grade AI apps with Llama Stack.
Here is a list of the various API providers and available distributions. See the full list for details, including External Providers.
| API Provider | Environments | Agents | Inference | VectorIO | Safety | Eval | DatasetIO |
|---|---|---|---|---|---|---|---|
| SambaNova | Hosted | ✅ | ✅ | ||||
| Cerebras | Hosted | ✅ | |||||
| Fireworks | Hosted | ✅ | ✅ | ✅ | |||
| AWS Bedrock | Hosted | ✅ | ✅ | ||||
| Together | Hosted | ✅ | ✅ | ✅ | |||
| Groq | Hosted | ✅ | |||||
| Ollama | Single Node | ✅ | |||||
| TGI | Hosted/Single Node | ✅ | |||||
| NVIDIA NIM | Hosted/Single Node | ✅ | ✅ | ||||
| ChromaDB | Hosted/Single Node | ✅ | |||||
| Milvus | Hosted/Single Node | ✅ | |||||
| Qdrant | Hosted/Single Node | ✅ | |||||
| Weaviate | Hosted/Single Node | ✅ | |||||
| SQLite-vec | Single Node | ✅ | |||||
| PG Vector | Single Node | ✅ | |||||
| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | ||||
| vLLM | Single Node | ✅ | |||||
| OpenAI | Hosted | ✅ | |||||
| Anthropic | Hosted | ✅ | |||||
| Gemini | Hosted | ✅ | |||||
| WatsonX | Hosted | ✅ | |||||
| HuggingFace | Single Node | ✅ | |||||
| NVIDIA NEMO | Hosted | ✅ | ✅ | ✅ | ✅ | ||
| NVIDIA | Hosted | ✅ | ✅ | ||||
| Infinispan | Single Node | ✅ |
A distribution (or "distro") is a pre-configured provider bundle for a specific deployment scenario — start with local Ollama and transition to production without changing your application code.
| Distribution | Llama Stack Docker | Start This Distribution |
|---|---|---|
| Starter Distribution | llamastack/distribution-starter | Guide |
| Starter Distribution GPU | llamastack/distribution-starter-cpu | Guide |
| PostgreSQL | llamastack/distribution-postgres-demo | N/A |
| Dell | llamastack/distribution-dell | Guide |
For the full list including Docker images see the Distributions Overview.
Full docs at llamastack.github.io/docs. Example apps at llama-stack-apps.
- Quick Start — start a Llama Stack server
- Getting Started Notebook — text and vision inference walkthrough
- Zero-to-Hero Guide — key components with code samples
- Server CLI Reference | Client CLI Reference
- Contributing | Adding a new API Provider | Release Process
Client SDKs — connect to a Llama Stack server in your preferred language:
| Language | Client SDK | Package |
|---|---|---|
| Python | llama-stack-client-python | |
| Swift | llama-stack-client-swift | |
| Typescript | llama-stack-client-typescript | |
| Kotlin | llama-stack-client-kotlin |
Note: We are considering a transition from Stainless to OpenAPI Generator for SDK generation (#4609). The
client-sdks/openapi/directory contains the new tooling for local SDK generation.
We hold regular community calls every Thursday at 09:00 AM PST — see the Community Event on Discord for details.
Thanks to all our amazing contributors!