Run Ollama on a Google Colab GPU and access it from your local machine via a Cloudflare Tunnel — no account or configuration required.
This is useful when models run too slow on your local machine, or when you need GPU-accelerated inference for synthetic data generation in large batches.
Run directly in a Colab cell with uvx — no install step needed:
!uvx collab-ollamaOr with a specific model:
!uvx collab-ollama -m gemma:2bIf you prefer a traditional install:
!pip install collab-ollama
!collab-ollamaBy default, phi3:mini is pulled and served. Use the -m / --model flag to choose a different model:
!uvx collab-ollama --model llama3:8b
!uvx collab-ollama -m gemma:2bOnce setup is complete, you'll see output like:
Setup is complete!
Base URL : https://xxxx-xxxxx-xxxxx-xxxxx.trycloudflare.com/v1/
API Key : No key required — leave it blank or use any string
Model : gemma:2b
Use the printed Base URL and Model with any OpenAI-compatible client. No API key is needed — leave it blank or pass any arbitrary string.
On your local machine, set OLLAMA_HOST to the base URL (without /v1/) and use the Ollama CLI as usual. Inference runs on the Colab GPU, but the experience feels local. Make sure you have the Ollama CLI installed locally.
export OLLAMA_HOST='https://xxxx-xxxxx-xxxxx-xxxxx.trycloudflare.com'
ollama run gemma:2b --verboseYou can pull and run any model that fits in the Colab GPU memory:
ollama pull llama3:8b
ollama run llama3:8bOllama exposes an OpenAI-compatible API. Install the SDK and use the Base URL directly:
pip install openaifrom openai import OpenAI
client = OpenAI(
base_url="https://xxxx-xxxxx-xxxxx-xxxxx.trycloudflare.com/v1/",
api_key="ollama", # any string works, or leave blank
)
response = client.chat.completions.create(
model="gemma:2b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
],
)
print(response.choices[0].message.content)Install the SDK:
npm install openaiimport OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://xxxx-xxxxx-xxxxx-xxxxx.trycloudflare.com/v1/",
apiKey: "ollama", // any string works, or leave blank
});
const response = await client.chat.completions.create({
model: "gemma:2b",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" },
],
});
console.log(response.choices[0].message.content);- Installs Ollama if not already present.
- Installs Cloudflared if not already present.
- Starts
ollama servewithOLLAMA_ORIGINS=*for broad CORS support. - Pulls the specified model (default
phi3:mini). - Opens a Cloudflare quick tunnel to
localhost:11434and prints the Base URL, API Key info, and Model name.
- Colab: A Google Colab notebook with a GPU runtime.
- Local machine: Ollama CLI (for CLI usage), Python with openai, or Node.js with openai.