Skip to content

Valkey support for Top Keys analysis#34

Open
ranshid wants to merge 2 commits intovalkey-io:mainfrom
ranshid:valkey-bottleneck-dection
Open

Valkey support for Top Keys analysis#34
ranshid wants to merge 2 commits intovalkey-io:mainfrom
ranshid:valkey-bottleneck-dection

Conversation

@ranshid
Copy link
Copy Markdown
Member

@ranshid ranshid commented Feb 3, 2026

No description provided.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@ranshid ranshid marked this pull request as ready for review February 6, 2026 05:56
allowing them to perform targeted mitigations such as key deletion, redistribute slots or scaling.

Some examples where a specific key can contribute to resource consumption include:
1. Extremely large hash tables can generate large network spikes when commands like `HGETALL` are executed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a problem, this can be easily attributes to command log.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. and I stated later that COMMAND LOG is a valid solution for some cases (like this).
Still, I think some users would love to get a more generic way to analyze the KEYs which are the root cause for different issues they get instead of cross-analyzing different statistics.


Some examples where a specific key can contribute to resource consumption include:
1. Extremely large hash tables can generate large network spikes when commands like `HGETALL` are executed.
2. Very large sets can cause extended server unresponsiveness when executing commands such as `SDIFFSTORE`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, can easily be identified by command log

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. same response as before. command log is a fine alternative. For root causing issues, I agree command log might be enough for most cases. I do think that in some cases users also want to understand the potential issues they might experience doing some "database analysis" in order to identify what is the largest keys they use without going to understanding this from they application side. This is not RCA, and maybe I should add this to the motivation section?


### 3. Integrability

- Output MUST be suitable for aggregation into cluster-wide or database-wide views.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already don't have database wide views, why does this need to be database wide?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh. I kinda battled my thoughts on how should we correctly expose these statistics. TBH I think that in most cases application would like a complete dataset analysis (not only per specific database).


Returns Top-N keys by size characteristics.
```
TOPKEYS <CARD | MEMORY> TOP <N>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention DB awareness, these requests are not per-DB.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. they require one to select the DB first. (or we can add the db as an optional argument)

ValkeyTopKeys.md Outdated

### Hot Keys

`hotkeys-max-n <integer>`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear why this needs to be a config, could just be part of the API.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH we can. But I think this will kinda force the implementation to be less frugal in resources like memory and CPU.

Comment on lines +113 to +114
4. **Key Memory Usage**
- Refer to the amount of memory consumed by a key. This should be identical to the output of the command `MEMORY USAGE <key>`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have memory usage, why do we need cardinality? It feels like memory is strictly more useful.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I think we can decide on only one. but then we need to decide if memory is worth implementing the tracking investment. I mean with valkey-cli I think users are always using bigkeys analysis and not memkeys. this is probably since memkeys is so much more expensive in CPU and time.

ValkeyTopKeys.md Outdated

`hotkeys-read-access-threshold <integer>` - default 3000
`hotkeys-write-access-threshold <integer>` - default 2000
Threashold configuration. Only keys exceeding these QPS thresholds appear in HOTKEYS output. Prevents low-activity keys from cluttering results.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand these configs, and more broadly how hot keys will work. Is it the current hot keys (All current keys accessed more than 3000 times) is it keys that were hot at some point (sort of like slowlog, a given key accessed more than X times in the past).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can explain more but I did not want to go into the hotkey algorithm here, as it is being discussed in an already existing PR. I think maybe I will remove these configs from the proposal in the RFC and we can discuss specific configs as part of the detailed PR.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@ranshid
Copy link
Copy Markdown
Member Author

ranshid commented Feb 17, 2026

@madolson Thank you for taking the time to review!
I know this is very raw and needs several iterations to focus the discussion on the major decisions, but wanted to start somewhere.
I made some changes following your comments. we can circle more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants