Conversation
Shard the radix into small trees. Previously, we just had one big tree for prefix/suffix each at schema. We had to acquire lock at global level on entire tree for any writes. Break into small configurable trees and lock the tree at shard level. Use first byte of the word to hash the tree. Allows parallel writes on different sharded tree Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com>
src/indexes/text/text_index.h
Outdated
| std::vector<std::unique_ptr<Shard>> shards_; | ||
| size_t num_shards_; |
There was a problem hiding this comment.
Sharding in the per-key TextIndex isn't helpful and will dramatically hurt our per-key space performance numbers (which are already awful). How hard would it be to make TextIndex be a template and pass into it the # of shards are compile time, then use a hard-coded array here. This means that for the per-key index we have close to zero space overhead.
|
My question here is similar to yesterday: what's the benefit of this over having one tree and still locking on the first byte? Essentially locking the paths branching from the root node. We'd need to share a lock on the root node, but the number of times the root node is updated is probably very small relative to the life of the main text index. My initial though here is that this adds complexity for minimal benefit. |
Use template based arrays instead of vectors for shards. Keep per key index separate to avoid 40 bytes mutex added for shard Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com>
Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com>
Shard the radix into small trees. Previously, we just had one big tree for prefix/suffix each at schema. We had to acquire lock at global level on entire tree for any writes. Break into small configurable trees and lock the tree at shard level. Use first byte of the word to hash the tree. Allows parallel writes on different sharded tree