When first loading koboldcpp the initial session has a clean KV cache (e.g. embd_inp=17 n_past=0 Context Size = 0).
However, with the FastFoward + ContextShift combo most of the KV cache is re-used between sessions (e.g. embd_inp=1 n_past=16 Context Size = 16), which can be quickly verified by using any prompt (e.g. "Write a poem."), hitting "New Session", then using any other prompt (e,g, "Write a joke.").
Under many circumstances this doesn't cause an issue and speeds things up a bit (e.g. no re-proccessing of the system prompt). However, it can and DOES result in MAJOR bugs under certain conditions.
For example, if you ask Gemma 4 26b to write a long output (e.g. story), and said output is larger than 1k, and then send any short follow-up prompt (e.g. Add 1 + 1.), before starting a "New Session", then the re-used KV cache is corrupt 100% of the time.
And even if you keep hitting "New Session" it remains corrupt until you change the system prompt (e.g. add a space) and force the KV cache to be fully re-processed (e.g. embd_inp=17 n_past=0 Context Size = 0).
I won't bother you with this again. This is my third and last attempt at explaining this bug. My brother already fixed it for me by modifying the "New Session" button so it also toggles a space at the end of the system prompt, forcing the creation of a clean KV cache between sessions. He also mentioned that there's already an api "clear_state" option that would probably work, but isn't being used between sessions.

When first loading koboldcpp the initial session has a clean KV cache (e.g. embd_inp=17 n_past=0 Context Size = 0).
However, with the FastFoward + ContextShift combo most of the KV cache is re-used between sessions (e.g. embd_inp=1 n_past=16 Context Size = 16), which can be quickly verified by using any prompt (e.g. "Write a poem."), hitting "New Session", then using any other prompt (e,g, "Write a joke.").
Under many circumstances this doesn't cause an issue and speeds things up a bit (e.g. no re-proccessing of the system prompt). However, it can and DOES result in MAJOR bugs under certain conditions.
For example, if you ask Gemma 4 26b to write a long output (e.g. story), and said output is larger than 1k, and then send any short follow-up prompt (e.g. Add 1 + 1.), before starting a "New Session", then the re-used KV cache is corrupt 100% of the time.
And even if you keep hitting "New Session" it remains corrupt until you change the system prompt (e.g. add a space) and force the KV cache to be fully re-processed (e.g. embd_inp=17 n_past=0 Context Size = 0).
I won't bother you with this again. This is my third and last attempt at explaining this bug. My brother already fixed it for me by modifying the "New Session" button so it also toggles a space at the end of the system prompt, forcing the creation of a clean KV cache between sessions. He also mentioned that there's already an api "clear_state" option that would probably work, but isn't being used between sessions.