Conversation
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
| allowing them to perform targeted mitigations such as key deletion, redistribute slots or scaling. | ||
|
|
||
| Some examples where a specific key can contribute to resource consumption include: | ||
| 1. Extremely large hash tables can generate large network spikes when commands like `HGETALL` are executed. |
There was a problem hiding this comment.
This isn't a problem, this can be easily attributes to command log.
There was a problem hiding this comment.
I agree. and I stated later that COMMAND LOG is a valid solution for some cases (like this).
Still, I think some users would love to get a more generic way to analyze the KEYs which are the root cause for different issues they get instead of cross-analyzing different statistics.
|
|
||
| Some examples where a specific key can contribute to resource consumption include: | ||
| 1. Extremely large hash tables can generate large network spikes when commands like `HGETALL` are executed. | ||
| 2. Very large sets can cause extended server unresponsiveness when executing commands such as `SDIFFSTORE`. |
There was a problem hiding this comment.
Same, can easily be identified by command log
There was a problem hiding this comment.
agree. same response as before. command log is a fine alternative. For root causing issues, I agree command log might be enough for most cases. I do think that in some cases users also want to understand the potential issues they might experience doing some "database analysis" in order to identify what is the largest keys they use without going to understanding this from they application side. This is not RCA, and maybe I should add this to the motivation section?
|
|
||
| ### 3. Integrability | ||
|
|
||
| - Output MUST be suitable for aggregation into cluster-wide or database-wide views. |
There was a problem hiding this comment.
We already don't have database wide views, why does this need to be database wide?
There was a problem hiding this comment.
Yeh. I kinda battled my thoughts on how should we correctly expose these statistics. TBH I think that in most cases application would like a complete dataset analysis (not only per specific database).
|
|
||
| Returns Top-N keys by size characteristics. | ||
| ``` | ||
| TOPKEYS <CARD | MEMORY> TOP <N> |
There was a problem hiding this comment.
You mention DB awareness, these requests are not per-DB.
There was a problem hiding this comment.
right. they require one to select the DB first. (or we can add the db as an optional argument)
ValkeyTopKeys.md
Outdated
|
|
||
| ### Hot Keys | ||
|
|
||
| `hotkeys-max-n <integer>` |
There was a problem hiding this comment.
Unclear why this needs to be a config, could just be part of the API.
There was a problem hiding this comment.
TBH we can. But I think this will kinda force the implementation to be less frugal in resources like memory and CPU.
| 4. **Key Memory Usage** | ||
| - Refer to the amount of memory consumed by a key. This should be identical to the output of the command `MEMORY USAGE <key>` |
There was a problem hiding this comment.
If we have memory usage, why do we need cardinality? It feels like memory is strictly more useful.
There was a problem hiding this comment.
Agree. I think we can decide on only one. but then we need to decide if memory is worth implementing the tracking investment. I mean with valkey-cli I think users are always using bigkeys analysis and not memkeys. this is probably since memkeys is so much more expensive in CPU and time.
ValkeyTopKeys.md
Outdated
|
|
||
| `hotkeys-read-access-threshold <integer>` - default 3000 | ||
| `hotkeys-write-access-threshold <integer>` - default 2000 | ||
| Threashold configuration. Only keys exceeding these QPS thresholds appear in HOTKEYS output. Prevents low-activity keys from cluttering results. |
There was a problem hiding this comment.
I don't understand these configs, and more broadly how hot keys will work. Is it the current hot keys (All current keys accessed more than 3000 times) is it keys that were hot at some point (sort of like slowlog, a given key accessed more than X times in the past).
There was a problem hiding this comment.
Sure. I can explain more but I did not want to go into the hotkey algorithm here, as it is being discussed in an already existing PR. I think maybe I will remove these configs from the proposal in the RFC and we can discuss specific configs as part of the detailed PR.
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
|
@madolson Thank you for taking the time to review! |
No description provided.