Local and arbitrary model support #9619
Replies: 41 comments 42 replies
-
|
Answers to the Questions part: I am not sure I have a good answer for number 1 and 3 but 2 it's extremely important to me that local model is truly local that is likely the point of why I am using a local model to start with for that task. I haven't been using warp for a long time because it wasn't open source so I do not have many thoughts on what parts of the UI I want at this time for local models, the little I have done with like Claude cli and the information that the UI provides for that is really nice and to have something like that for local models would be cool but it might depend on what harness people are using idk. |
Beta Was this translation helpful? Give feedback.
-
|
No server transit for any requests |
Beta Was this translation helpful? Give feedback.
-
|
Warp looks really cool, but the fact it only worked with cloud models always was a deal breaker for me. I would love to have full local AI support for not only for coding, but for the terminal agent when reacting to commands. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for opening up some discussion! I think for your options, #1 is most appealing, but yes, more work. I'd like to think it could also help your architecture long term by relying less on scaling server side components along with client components. #2 is a bit hacky but could be quick. I connect to my Hermes agent using OpenWebUI because it is nicer than the raw terminal client - I'd connect to it through Warp if it were an option, but that doesn't really support Warp being a standalone tool. #3 could be most practical because it would be fully local, not require a second tool, and for most local models, not being feature-complete would be OK. The smaller context windows mean fewer turns, fewer tools available, etc. But my use case in Warp would mostly be "uh help me remember how this command is used" not "build out an entire ansible deployment for a lab". #4 is probably not worth doing - if someone wants to use a local model, its because they want it local. I'd rank them - 1, 3, 2, 4 For the questions:
|
Beta Was this translation helpful? Give feedback.
-
|
My vote is strongly for option 1. If Warp supports local or arbitrary models, I think it should mean truly local execution with no server transit. I understand that porting the harness to the client and open sourcing it is the most work, but it seems like the right long term architecture for privacy, offline use, trust, and extensibility. Thanks! |
Beta Was this translation helpful? Give feedback.
-
|
hey got a warp fork of my own ,trying to get the ollama support work repo:https://github.com/crazygamerZ783/warp-ollama |
Beta Was this translation helpful? Give feedback.
-
|
Honestly, I want to use DeepSeek V4 Flash inside Warp — it’s cheap, and it allows me to interact with the terminal using natural language. |
Beta Was this translation helpful? Give feedback.
-
|
Local models in Ollama or similar should be configurable as sources within Warp. Once available, you should be able to select a model during a session, either manually or by directing Warp to use it automatically based on the task or preference. |
Beta Was this translation helpful? Give feedback.
-
|
6666 |
Beta Was this translation helpful? Give feedback.
-
|
There are already a handful of "local warp server" implementations on your PR list , and on the wild forking from this repo. People just want to be able to use a software that they really like (warp) without going through something that they don't need (your servers). we might end up with some opensource spin-off leading this if you don't just release a minimalist opensource server that simply allows people to use warp with a openai-compatible upstream. In the future you can add something feature-rich and supporting a bunch of stuff... but for now people just want to use warp and remote models without touching someone else servers. |
Beta Was this translation helpful? Give feedback.
-
|
Model selection should be allowed to be
|
Beta Was this translation helpful? Give feedback.
-
|
All this feedback makes sense. We will have a proposed solution here shortly. |
Beta Was this translation helpful? Give feedback.
-
|
I am mostly interested as I want to use one source of models (openrouter/GLM Coding Plan) for it. |
Beta Was this translation helpful? Give feedback.
-
|
Warp looks really cool, but the fact it only worked with cloud models always was a deal breaker for me. I would love to have full local AI support for not only for coding, but for the terminal agent when reacting to commands. |
Beta Was this translation helpful? Give feedback.
-
|
There are already a handful of "local warp server" implementations on your PR list , and on the wild forking from this repo. People just want to be able to use a software that they really like (warp) without going through something that they don't need (your servers). we might end up with some opensource spin-off leading this if you don't just release a minimalist opensource server that simply allows people to use warp with a openai-compatible upstream. In the future you can add something feature-rich and supporting a bunch of stuff... but for now people just want to use warp and remote models without touching someone else servers. |
Beta Was this translation helpful? Give feedback.
-
|
Came here as heard Warp now supporting Windows. Installed and then immediately uninstalled after realising the product is effectively useless without sign up and sending data to yet another provider. With most workstation level laptops now coming with a dedicated NPU or GPU with a few gig of VRAM (even shared RAM is OK for lower end qwen models) people may as well make use of to make terminal life easier. Personally just want an AI shell for system management and basic automation:
Not interested in coding capabilities as use other tools for that. 1 and 3 are the better options. Given there are already forks in the wild why not encourage them to PR (if they haven't already) |
Beta Was this translation helpful? Give feedback.
-
|
It's frustrating how big companies constantly try to seize control of your computer and data. LLMs and their harnesses should act as assistants to the terminal, not as supervisors. If Warp continues with its closed mindset, open alternatives like OpenWarp or other terminal+LLM apps will thrive and take its place. |
Beta Was this translation helpful? Give feedback.
-
|
If I can contribute in any way whatsoever whether it's bug bounty for your project or contributing code. Please let me know I'd be highly interested. I have a personal vendetta against warp. I was just thinking along the lines of instead of starting. My own repository in starting this whole task, out from scratch, I would join the community. This is something that I did not do with deep seek |
Beta Was this translation helpful? Give feedback.
-
|
now that's music to the ears,Also add ability to fetch model, for /models endpoint, as well as context windows, and whether model has vision or not. |
Beta Was this translation helpful? Give feedback.
-
|
Quick update here: we’ve shipped two related pieces of this work. BYOK is now available on the Free plan for individual users, and Warp now supports custom inference endpoints compatible with the OpenAI Chat Completions API. That means you can use your own OpenAI, Anthropic, or Google API key, or connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, a gateway, or a similar setup. Docs:
Fully client-side local model support is still a separate direction. We’re planning a lightweight local client harness so Warp can connect directly to local models without routing through Warp’s servers, and we’re also planning support for Agent Client Protocol so developers can bring other harnesses into Warp’s terminal UI. If you try the new flow and hit a specific issue with a provider, endpoint, or model, please open a focused GitHub issue with the details so we can track it directly. |
Beta Was this translation helpful? Give feedback.
-
|
I was about to test this, with the understanding that our internal LiteLLM proxy works, but as it turns out it needs to be internet accessible - which is understandable if the traffic bounces through your services. I will wait for the version that does not require publicly accessible endpoint. though,I got to say, this page https://docs.warp.dev/agent-platform/inference/custom-inference-endpoint/ states |
Beta Was this translation helpful? Give feedback.
-
|
If you want to test your local LLM before Warp adds support, I used the free tier of Cloudflare Zero Trust Connectors (it's a tunnel that you can run in a Docker container) to make my local LM Studio available publicly (with authentication of course). You'll need a domain though. Cloudflare turns on "Block AI training bots" by default for your domain, I had to disable that, otherwise Warp was getting 403s. |
Beta Was this translation helpful? Give feedback.
-
|
This discussion in linked in #8759, but I don't see any mention of support for using ChatGPT Pro/Plus subscriptions. It would be really great to have this on the roadmap. Not everyone can afford paying per-token with API keys. Echoing @Patrik88:
|
Beta Was this translation helpful? Give feedback.
-
|
For folks who need fully self-hosted local/remote model routing today (not just client-side harness migration), one option outside Warp ecosystem: WinkTerm — open-source AI terminal where AI and user share the same PTY. Bring your own API key, type Docker deploy, SSH/SFTP, HTTP Agent API, MIT: https://github.com/Cznorth/winkterm Different product (web self-hosted vs native terminal), but relevant if local-model support timeline matters for your workflow. |
Beta Was this translation helpful? Give feedback.
-
|
Looking forward to use, any new status on fully client-side local model support or lightweight local client harness so Warp can connect directly to local models without routing trough external (as we say Kirche ums Dorf bringen - Bringing the church around the village) ? |
Beta Was this translation helpful? Give feedback.
-
|
Cool to see such discussion. Looking forward to have this feature in warp! I wish I saw it earlier because I've spent quiet a lot of time implementing local agent in https://github.com/man-brain/warp/tree/local-agent. It works, I use it day to day and rebase regularly but I don't really want to maintain a fork.
It's the most important thing.
I finished with implementing option 3 but before that I tried two other approaches: my own implementation of Warp API (it was very tempting not to edit the warp code at all) and replacing the client to Warp API. It didn't work since the API is not a thin proxy, it has a lot of logic including agent loop and custom tools. |
Beta Was this translation helpful? Give feedback.
-
|
Late to the party, but here's my $0.02: Regarding the 4 suggested options: The options we are considering here (not mutually exclusive):
Other questions:
Depends very much on how much that impacts the level of effort to configure and use.
For those who are using local models, this is probably the single most important issue.
The AI augmentation of the CLI experience is, for me, a significant benefit to this product that differentiates it from other terminal options. Getting AI-assisted auto-complete while typing commands and being able to just click a button to execute subsequent steps the agent anticipants are a real productivity boost. Since I have a local model running, not having to funnel that functionality through an external API that comes with pretty steep costs for something I don't actually need from an external source is quite a selling point. |
Beta Was this translation helpful? Give feedback.
-
|
Is it possible to use local models yet? |
Beta Was this translation helpful? Give feedback.
-
|
I appreciate the work on Custom Inference, but the current architecture makes local setups difficult to use. From the explanation in this thread, requests are routed through Warp's backend rather than being sent directly from the client. In my case, this prevents Warp from connecting to a local load balancer on a private network (http://192.168.1.x/). I also don't think using tools like ngrok is a practical solution. One of the main reasons for running local models is to keep everything local, including network traffic. A direct client-side connection mode that can communicate with localhost and private IPs without going through Warp's infrastructure would make this feature far more useful for users running self-hosted models. |
Beta Was this translation helpful? Give feedback.
-
|
Hey everyone! Just wanted to share a quick update on this. I was looking into custom/local endpoints and ended up setting up support for custom OpenAI-compatible base URLs in the BYOK settings. If anyone is interested in how it works or wants to check out the implementation, I put it together in a PR here: [PR] This basically lets you route standard OpenAI models to a custom URL (like a local proxy, enterprise gateway, or Copilot). Hopefully, it's a helpful step for anyone looking for a similar setup! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We are trying to figure out the best way to implement local model support and I wanted to start a discussion on our different potential approaches to see what resonates most with the community.
The reason local model support is not trivial for us to implement is that our harness is split between our client (rust, open-source) and server (golang, not currently open). Moving the harness to be entirely on the client is a fair amount of work.
The options we are considering here (not mutually exclusive):
Questions on my mind:
Beta Was this translation helpful? Give feedback.
All reactions