Use dictGetSomeKeys instead of dictGetRandomKey for gossip node population by hpatro · Pull Request #3258 · valkey-io/valkey

hpatro · 2026-02-26T05:55:49Z

Observation around dictGetSomeKeys(N) performance is that it is around 10x faster than randomly picking N elements over N*3 attempts. Further, this avoids chances of not picking N elements due to duplicate element selection with individual random pick. Overall, it should help picking gossip node(s) for cluster bus message faster.

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

dvkashapov

Nice!

src/cluster_legacy.c

zuiderkwast · 2026-02-27T22:56:41Z

dictGetSomeKeys is less randomly distributed.

It picks one random slot in the dict and then iterates forward and takes every element until the the right number of elements are returned.

It means that that a node is picked together with its neighbours in the dict. If a node picks 200 out of 2000 nodes, then the next time, if picks one of these again, it will also include much of the same 200 nodes again.

Additionally, if the nodes are configured to use the same hash seed, then the clustering will look the same on all nodes in the cluster.

I have a strong feeling this can affect the information spreading negatively. I think we need some theoretical at least hand-wavy reasoning about it.

hpatro · 2026-02-28T02:12:01Z

It would start from a random index though for each pass, so I think it wouldn't be the same set of data each time.

Spinning up a large cluster isn't practical for these type of changes. Let me see if I can unit test the randomness of data across runs.

zuiderkwast · 2026-03-02T09:47:02Z

It would start from a random index though for each pass, so I think it wouldn't be the same set of data each time.

It's not the same set every time. But it picks a random range instead of nodes at random. Perhaps gossiping about a range of nodes is fine.

Let's also think about the chaining. Some nodes are more likely to be picked than others depending on chained buckets. Especially in small clusters where we want to pick only three nodes. For example in a 3+3 cluster with nodes A-F, the dict will have size 8. Each node is hashed into any slot, independently of other nodes. It may very well end up like this:

Dict slots: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Nodes:        F           D       A   B
                                  E
                                  C

When we pick 3 nodes from these using dictGetSomeKeys, we start from a random slot and pick the next three nodes:

Slot   Next 3 nodes
----   ------------
0      F, D, A
1      F, D, A
2      D, A, E
3      D, A, E
4      D, A, E
5      A, E, C
6      A, E, C
7      B, F, D

Node B is only picked if we start at slot 7. The other nodes are picked more often. Some are picked in 5 start-slots. So, clearly this method isn't very fair in a small cluster.

Maybe we can use this method only in larger clusters? In a large cluster, the bucket-chains might be longer so this effect can be observed there too?

Spinning up a large cluster isn't practical for these type of changes. Let me see if I can unit test the randomness of data across runs.

That'd be good!

You can look at the hashtable unit tests for inspiration. When I developed the random selection there, I experimented with different sample sizes and tried to measure the randomness.

hpatro · 2026-03-04T18:30:14Z

@zuiderkwast Thanks for the pointers. Will take a look.

…ior for fairness Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

hpatro · 2026-03-06T23:27:34Z

@zuiderkwast Used AI (codex) to generate some unit tests to test the fairness with chaining and evenly spread scenario.

With chaining scenario, the fairness is worse with the dictRandomKey unstable approach.
With even distribution, the fairness is uniform.

I've added the unit tests to the PR (will delete once we decide to proceed ahead). Have a look at them and the results below.

- `test_dictGossipSamplingSimulationUnstableVsSomeKeys` (chaining/adversarial)
- `test_dictGossipSamplingSimulationUnstableVsSomeKeysNoChainSpread` (no chaining/even spread)

Both run `wanted=3`, `100000` rounds, fixed RNG seed.

`spread` below means: `max(total_picks_per_node) - min(total_picks_per_node)`.

Chaining / adversarial layout (6 nodes, 8 buckets, one 3-node chain) (Scenario explained above)

- **unstable loop** (`dictGetRandomKey` + dedupe + `wanted*3` retries):
  - spread: **39258**
  - rounds with `<3` picks: **605/100000**
  - strong skew (some nodes ~0.69 inclusion/round, others ~0.30)
- **dictGetSomeKeys**:
  - spread: **26464**
  - rounds with `<3` picks: **0/100000**
  - still skewed, but less than unstable in this layout
  
unstable simulation results:
  per-round inclusion: A=0.3038 B=0.6951 C=0.3025 D=0.6941 E=0.3057 F=0.6927
  total picks: A=30382 B=69508 C=30250 D=69408 E=30571 F=69275
  spread max-min: 39258
  rounds with fewer than 3 picks: 605/100000
dictGetSomeKeys simulation results:
  per-round inclusion: A=0.5132 B=0.3745 C=0.5117 D=0.6391 E=0.5125 F=0.4490
  total picks: A=51318 B=37445 C=51174 D=63909 E=51255 F=44899
  spread max-min: 26464
  rounds with fewer than 3 picks: 0/100000

No chaining / evenly spread layout (8 nodes, 8 buckets)

- expected picks per node: **37500**
- **unstable loop**:
  - spread: **249**
  - max deviation from expected: **133**
  - rounds with `<3` picks: **12/100000**
- **dictGetSomeKeys**:
  - spread: **304**
  - max deviation from expected: **179**
  - rounds with `<3` picks: **0/100000**
 
 unstable (no-chain spread) simulation results:
  per-round inclusion: A=0.3746 B=0.3738 C=0.3758 D=0.3753 E=0.3745 F=0.3763 G=0.3750 H=0.3745
  total picks: A=37458 B=37384 C=37581 D=37529 E=37448 F=37633 G=37503 H=37452
  spread max-min: 249
  rounds with fewer than 3 picks: 12/100000

dictGetSomeKeys (no-chain spread) simulation results:
  per-round inclusion: A=0.3753 B=0.3739 C=0.3758 D=0.3768 E=0.3761 F=0.3740 G=0.3738 H=0.3744
  total picks: A=37527 B=37393 C=37581 D=37679 E=37613 F=37395 G=37375 H=37437
  spread max-min: 304
  rounds with fewer than 3 picks: 0/100000

no-chain expected per node=37500
no-chain max deviation: unstable=133 dictGetSomeKeys=179
no-chain spread: unstable=249 dictGetSomeKeys=304

zuiderkwast · 2026-03-06T23:32:51Z

With chaining scenario, the fairness is worse with the dictRandomKey unstable approach.

With even distribution, the fairness is uniform.

Wow, nice! Good result.

I guess I was thinking about dictFairRandomKey.

OK, then, go for it! And remove the unit tests. We already moved to googletest in unstable so you will get nasty conflicts if you keep them.

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

sarthakaggarwal97

Noice!

codecov · 2026-03-07T05:46:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.02%. Comparing base (8c397ea) to head (e6436dc).
⚠️ Report is 8 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #3258      +/-   ##
============================================
+ Coverage     74.98%   75.02%   +0.04%     
============================================
  Files           129      129              
  Lines         71551    71552       +1     
============================================
+ Hits          53654    53684      +30     
+ Misses        17897    17868      -29

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`87.96% <100.00%> (-0.11%)`	⬇️

... and 27 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hpatro · 2026-03-09T21:58:47Z

@zuiderkwast Have cleaned up the tests. If you're fine with the changes, PTAL and approve it.

…ation (#3258) Observation around `dictGetSomeKeys(N)` performance is that it is around 10x faster than randomly picking `N` elements over N*3 attempts. Further, this avoids chances of not picking N elements due to duplicate element selection with individual random pick. Overall, it should help picking gossip node(s) for cluster bus message faster. --------- Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

hpatro added 3 commits February 25, 2026 21:31

Use dictGetSomeKeys instead of random key

4d9babd

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

Add benchmark test

01836e7

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

Remove unnecessary comment

8563eb1

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

dvkashapov approved these changes Feb 26, 2026

View reviewed changes

github-actions bot assigned hpatro Feb 26, 2026

github-advanced-security bot found potential problems Feb 26, 2026

View reviewed changes

src/cluster_legacy.c Dismissed Show dismissed Hide dismissed

sarthakaggarwal97 reviewed Feb 26, 2026

View reviewed changes

src/cluster_legacy.c Show resolved Hide resolved

hpatro requested review from enjoy-binbin and zuiderkwast February 27, 2026 16:54

sarthakaggarwal97 approved these changes Feb 27, 2026

View reviewed changes

Generate unit tests to compare dictRandomKey vs dictGetSomeKeys behav…

ba61003

…ior for fairness Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

hpatro added 2 commits March 6, 2026 20:39

Remove tests

5741b82

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

Merge branch 'unstable' into pick_n_gossip_elements

e6436dc

sarthakaggarwal97 approved these changes Mar 7, 2026

View reviewed changes

zuiderkwast approved these changes Mar 10, 2026

View reviewed changes

hpatro merged commit 6fee1aa into valkey-io:unstable Mar 10, 2026
58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use dictGetSomeKeys instead of dictGetRandomKey for gossip node population#3258

Use dictGetSomeKeys instead of dictGetRandomKey for gossip node population#3258
hpatro merged 6 commits intovalkey-io:unstablefrom
hpatro:pick_n_gossip_elements

hpatro commented Feb 26, 2026 •

edited

Loading

Uh oh!

dvkashapov left a comment

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Feb 27, 2026

Uh oh!

hpatro commented Feb 28, 2026

Uh oh!

zuiderkwast commented Mar 2, 2026 •

edited

Loading

Uh oh!

hpatro commented Mar 4, 2026

Uh oh!

hpatro commented Mar 6, 2026

Uh oh!

zuiderkwast commented Mar 6, 2026

Uh oh!

sarthakaggarwal97 left a comment

Uh oh!

codecov bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

hpatro commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

hpatro commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvkashapov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Feb 27, 2026

Uh oh!

hpatro commented Feb 28, 2026

Uh oh!

zuiderkwast commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hpatro commented Mar 4, 2026

Uh oh!

hpatro commented Mar 6, 2026

Chaining / adversarial layout (6 nodes, 8 buckets, one 3-node chain) (Scenario explained above)

No chaining / evenly spread layout (8 nodes, 8 buckets)

Uh oh!

zuiderkwast commented Mar 6, 2026

Uh oh!

sarthakaggarwal97 left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hpatro commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hpatro commented Feb 26, 2026 •

edited

Loading

zuiderkwast commented Mar 2, 2026 •

edited

Loading

codecov bot commented Mar 7, 2026 •

edited

Loading