Prerequisites
Feature Description
Inference presets for the webui that store, and override the inference parameters when pressing "Send", e.g.:
- System Prompt (annoying to have a collection of them in a folder in files, and copy-pasting in the correct one to the WebUI every time)
- TopP, MinP, TopK, *-Penalty (its possible to create presets with different values for these in the router mode, but there is no sense in having the router-mode unload a model, just to reload it again with different default inference settings)
Akin to #22412 (which discusses more than just what i think is a first-draft-worthy request)
Motivation
llama-server is just about everything i want in a WebUI at this point (thanks to the new built-in tools and MCP support!) and the UI/UX is generally better for me in every aspect from loading to managing only a single resource (that being, llama-server and a single .ini config file), rather than handling model loading in one application, presets/ui in another, agentic use in yet another...
However, when the usecase for the WebUI becomes more than just a single system prompt, the additional work the user (me) has to do to manage that, is out of proportion and makes me want to either not bother, or find alternative applications.
Possible Implementation
A general reference/idea can be seen in how LMStudio handles presets.
After drafting a few iterations myself with Claude, GPT5.3-Codex and GLM5, i came to the conclusion that the least intrusive version of this would not directly modify the system prompt that is saved in the WebUI, or in the chat itself, but rather create new conversation-branches, if a selected preset changes:
- An easy to use picker/manager for Presets (and assign uuid or similar to it)
- Attach the uuid to the Chat in the UI, and on mismatch create a branching conversation from the first message onwards
- Hoverable additional item if a message was used with a selected preset (e.g. Name, short list of the inference parameters, first few words of the system prompt)
I did play with the idea of just hijacking the prompt that gets sent internally to the backend (e.g. replace the parameters on the fly), but that creates a mismatch in whats visible in the UI, and it makes it harder to understand what prompt with what settings was used at what point
Prerequisites
Feature Description
Inference presets for the webui that store, and override the inference parameters when pressing "Send", e.g.:
Akin to #22412 (which discusses more than just what i think is a first-draft-worthy request)
Motivation
llama-server is just about everything i want in a WebUI at this point (thanks to the new built-in tools and MCP support!) and the UI/UX is generally better for me in every aspect from loading to managing only a single resource (that being, llama-server and a single .ini config file), rather than handling model loading in one application, presets/ui in another, agentic use in yet another...
However, when the usecase for the WebUI becomes more than just a single system prompt, the additional work the user (me) has to do to manage that, is out of proportion and makes me want to either not bother, or find alternative applications.
Possible Implementation
A general reference/idea can be seen in how LMStudio handles presets.
After drafting a few iterations myself with Claude, GPT5.3-Codex and GLM5, i came to the conclusion that the least intrusive version of this would not directly modify the system prompt that is saved in the WebUI, or in the chat itself, but rather create new conversation-branches, if a selected preset changes:
I did play with the idea of just hijacking the prompt that gets sent internally to the backend (e.g. replace the parameters on the fly), but that creates a mismatch in whats visible in the UI, and it makes it harder to understand what prompt with what settings was used at what point