The biggest user friction we have right now is making sure a good model with the right settings (i.e., recipe) is loaded for agentic workloads like OpenClaw and Claude Code.
We can't automatically choose a recipe for users because we don't know what will cause them to OOM. Knowing this requires a memory calculator that takes into account the model size, and how that model's architecture changes the RAM requirement with critical settings like ctx_size.
One way to solve this is brute force: load all the top tool-calling models with varying context sizes, measure memory pressure, put the data in a table, and do a lookup on that table at runtime.
Alternatively, we could try to come up with a formula for RAM usage. This might be tricky though because its highly model-architecture and implementation-dependent, and can change across llamacpp releases.
Ideas welcome!
cc @sawansri
The biggest user friction we have right now is making sure a good model with the right settings (i.e., recipe) is loaded for agentic workloads like OpenClaw and Claude Code.
We can't automatically choose a recipe for users because we don't know what will cause them to OOM. Knowing this requires a memory calculator that takes into account the model size, and how that model's architecture changes the RAM requirement with critical settings like
ctx_size.One way to solve this is brute force: load all the top tool-calling models with varying context sizes, measure memory pressure, put the data in a table, and do a lookup on that table at runtime.
Alternatively, we could try to come up with a formula for RAM usage. This might be tricky though because its highly model-architecture and implementation-dependent, and can change across llamacpp releases.
Ideas welcome!
cc @sawansri