Feature Request: WebUI - Presets (System Prompt, Inference Parameters)

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Inference presets for the webui that store, and override the inference parameters when pressing "Send", e.g.:

- System Prompt (annoying to have a collection of them in a folder in files, and copy-pasting in the correct one to the WebUI every time)
- TopP, MinP, TopK, *-Penalty (its possible to create presets with different values for these in the router mode, but there is no sense in having the router-mode unload a model, just to reload it again with different default inference settings)


Akin to https://github.com/ggml-org/llama.cpp/discussions/22412 (which discusses more than just what i think is a first-draft-worthy request)

### Motivation

llama-server is just about everything i want in a WebUI at this point (thanks to the new built-in tools and MCP support!) and the UI/UX is generally better for me in every aspect from loading to managing only a single resource (that being, llama-server and a single .ini config file), rather than handling model loading in one application, presets/ui in another, agentic use in yet another...

However, when the usecase for the WebUI becomes more than just a single system prompt, the additional work the user (me) has to do to manage that, is out of proportion and makes me want to either not bother, or find alternative applications.

### Possible Implementation

A general reference/idea can be seen in how LMStudio handles presets.

After drafting a few iterations myself with Claude, GPT5.3-Codex and GLM5, i came to the conclusion that the least intrusive version of this would not directly modify the system prompt that is saved in the WebUI, or in the chat itself, but rather create new conversation-branches, if a selected preset changes:

1. An easy to use picker/manager for Presets (and assign uuid or similar to it)
2. Attach the uuid to the Chat in the UI, and on mismatch create a branching conversation from the first message onwards
3. Hoverable additional item if a message was used with a selected preset (e.g. Name, short list of the inference parameters, first few words of the system prompt)

I did play with the idea of just hijacking the prompt that gets sent internally to the backend (e.g. replace the parameters on the fly), but that creates a mismatch in whats visible in the UI, and it makes it harder to understand what prompt with what settings was used at what point

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: WebUI - Presets (System Prompt, Inference Parameters) #22523

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: WebUI - Presets (System Prompt, Inference Parameters) #22523

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions