Conversation
|
Thanks for the PR and the detailed writeup, this is a real problem worth solving. I'm a bit hesitant about hardcoding The thing is, we already have infrastructure to handle this: the executor truncates JSON arrays to 100 items and caches anything over 25KB with a preview + pagination tools. So in theory, that 1.15 MB response should never actually hit the LLM's context. If it is, that's the bug I'd want to fix. The other concern is that by dropping And if someone explicitly passes I think the better fix here would be to either:
Would you be up for looking into one of those approaches instead? Happy to help dig into the caching path if that would be useful. |
The
fastly service list --jsoncommand returns a massive amount of unnecessary data, especially when your account has nearly 100 services.More specifically, every service includes an array with details on every version its ever had. We have nearly 100 services, some of which have 700+ versions. Our
fastly service list --jsoncurrently returns over 1.15 MB of text, or roughly 448,310 tokens, and that number can only grow.This immediately overloads the LLM context for most models and causes the call to fail.
I've found this fix to work well, as it allows the LLM to truly obtain a simple list of services, then query what's necessary after that.
Perhaps you could instead intercept the json to remove the excess data before passing it to the LLM, but this seemed like the most straightforward approach.