Skip to content

Shard the radix into small trees#881

Open
VoletiRam wants to merge 3 commits intovalkey-io:mainfrom
VoletiRam:rax_shard
Open

Shard the radix into small trees#881
VoletiRam wants to merge 3 commits intovalkey-io:mainfrom
VoletiRam:rax_shard

Conversation

@VoletiRam
Copy link
Collaborator

Shard the radix into small trees. Previously, we just had one big tree for prefix/suffix each at schema. We had to acquire lock at global level on entire tree for any writes. Break into small configurable trees and lock the tree at shard level. Use first byte of the word to hash the tree. Allows parallel writes on different sharded tree

Shard the radix into small trees. Previously, we just had one big tree
for prefix/suffix each at schema. We had to acquire lock at global level
on entire tree for any writes. Break into small configurable trees and
lock the tree at shard level. Use first byte of the word to hash the
tree. Allows parallel writes on different sharded tree

Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com>
Comment on lines +128 to +129
std::vector<std::unique_ptr<Shard>> shards_;
size_t num_shards_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharding in the per-key TextIndex isn't helpful and will dramatically hurt our per-key space performance numbers (which are already awful). How hard would it be to make TextIndex be a template and pass into it the # of shards are compile time, then use a hard-coded array here. This means that for the per-key index we have close to zero space overhead.

@BCathcart
Copy link
Collaborator

My question here is similar to yesterday: what's the benefit of this over having one tree and still locking on the first byte? Essentially locking the paths branching from the root node. We'd need to share a lock on the root node, but the number of times the root node is updated is probably very small relative to the life of the main text index. My initial though here is that this adds complexity for minimal benefit.

Ram Prasad Voleti added 2 commits March 11, 2026 03:34
Use template based arrays instead of vectors for shards. Keep per key
index separate to avoid 40 bytes mutex added for shard

Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com>
Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants