DO NOT MERGE — POC: entity-level routing with composite tablet keys and TabletIndex#9590
DO NOT MERGE — POC: entity-level routing with composite tablet keys and TabletIndex#9590mlwelles wants to merge 21 commits intosharding-pocfrom
Conversation
Add IsLabeled() methods to Tablet, Member, and SchemaUpdate protobuf types, replacing raw `Label != ""` checks with nil-safe, semantic helpers. This centralizes the label-check logic and makes the intent clearer at each call site.
Design document for extending label-based predicate routing to support entity-level routing via dgraph.label. Key decisions: - Sub-tablet keys: predicate@label (backward compat for unlabeled) - Entity label stored on group 1 as reserved predicate - Two-phase mutation routing (extract labels, then route) - Query fan-out to all authorized sub-tablets - Synchronous reclassification following predicate-move pattern - Entity label > predicate @Label > unlabeled priority
13-task implementation plan for entity-level sub-tablet routing, covering composite tablet keys, Zero state machine changes, two-phase mutation routing, query fan-out, entity label cache, and integration tests.
Switch handleTablet from using bare tablet.Predicate as the map key to using pb.TabletKey(predicate, label) composite keys. This enables multiple groups to serve the same predicate with different labels (sub-tablets). Duplicate detection now uses servingSubTablet to check for (predicate, label) pair conflicts instead of predicate-only conflicts.
Add TestEntityLevelRouting that verifies entity-level sub-tablet routing works end-to-end: setting dgraph.label on a UID pins all its predicates to the labeled group via composite tablet keys (predicate@label) in Zero state, and queries fan out across all sub-tablet groups to return complete results.
… routing The TestEntityLevelRouting integration test was failing non-deterministically (returning 0-1 of 3 expected documents). Root cause: mergeResults appended UIDs from fan-out goroutines in non-deterministic order, but downstream algo.IndexOf uses binary search assuming sorted UIDs. The JSON encoder in preTraverse then silently skipped UIDs that binary search couldn't find. Key fixes: - Sort merged UidMatrix entries in mergeResults after appending from fan-out - Two-pass tablet caching in applyState (other groups first, own group last) so bare-predicate aliases correctly map to the own group's sub-tablet - Store tablets under composite keys (pred@label) in BelongsToReadOnly, sendTablet, and Inform to preserve AllSubTablets canonical entry discovery - Use BelongsToReadOnly in checkTablet (proposal.go) instead of Tablet+label to avoid label-resolution mismatch on entity-level sub-tablets - Fix Zero's ServingTablet/ShouldServe/Inform to use composite keys and sub-tablet fallback search for entity-level routing - Fix commit validation (oracle.go) to check all sub-tablets via ServingTablets - Simplify resolveLabel to entity-only (predicate @Label handled by Zero) - Update test assertions for composite tablet keys, add retry logic, sort query results in Go instead of using DQL orderasc (avoids sort triplication) All 27 label integration tests pass.
…O(1) lookups Introduce a TabletIndex type (map[pred]map[label]*Tablet) that replaces three O(n) patterns with O(1) nested lookups: - ServingTablet fallback loop that scanned all tablets via ParseTabletKey - AllSubTablets scan that iterated entire flat maps to find same-predicate tablets - applyState two-pass aliasing that required processing groups in specific order The index is always rebuilt from the authoritative flat proto maps (in regenerateChecksum on Zero, applyState on Alpha), preserving the wire format as the single source of truth. Key methods: Get, Set, GetAny (prefers unlabeled), GetForGroup (prefers caller's group), AllForPredicate (O(1) label map), BuildFromFlat (bridge from proto). Also renames "sub-tablet" terminology to "label tablet" throughout.
Clean up tablet-related function names and comments to use simpler, more consistent naming. Remove unused public method ServingLabelTablet.
|
I appreciate the progress made on sharding predicates among alphas. However, I feel that both pinning predicates and entity-level routing are merely workarounds for issues that Dgraph should resolve independently. Ideally, Dgraph would automatically detect when a predicate has grown too large on an alpha and then move a portion of that predicate to another alpha, without requiring intervention from a database administrator. |
|
@xqqp Note this branch is a proof of concept. You should not take anything happening in this branch as future direction.
What you describe here is, in essence, the current functionality of group rebalancing in Dgraph and has been for years. |
I'm aware of that. Just thought I give some feedback.
In which release was that implemented? If I'm reading this correctly at least in December 2024 predicates couldn't be split: https://discuss.dgraph.io/t/does-dgraph-have-a-scalability-problems-with-graphs-having-single-heavy-predicate/19647/2 |
|
Right, sharding is still per predicate. When I said "in essense", I was referring to the rebalancing based on disk consumption per-predicate. |
|
Alright, then we are on the same page. |
|
Closing to prevent confusion, as it's not intended as anything but a POC of feasibility. |
What
sharding-poc(#9574) providesThe base branch implements schema-level label routing: a predicate declared with
@label(secret)is pinned to the Alpha group running with--label=secret. One predicate → one group. It also adds program-based auth viadgraph.programsfacets.What this branch adds
Entity-level routing — the same predicate can now live on multiple groups simultaneously, one per label. Instead of the schema pinning an entire predicate to a group, individual entities are pinned via a
dgraph.labelvalue on their UID:Zero's tablet state now contains composite keys for each (predicate, label) pair:
A query like
has(Document.name)fans out to all three groups and merges the results.Implementation breakdown
Composite tablet keys (
protos/pb/labeled.go)TabletKey(pred, label)→"pred@label"(or bare"pred"when unlabeled)ParseTabletKey(key)→ splits back into(pred, label)map<string, Tablet>dgraph.labelas a reserved predicate (schema/schema.go,x/keys.go)initialSchemaInternalwith string type + exact indexstarAllPredicateMapso*queries include itEntity label cache (
worker/entity_label_cache.go— new file)UID → labelmap on Alpha (1M entries, ~16MB)DropAll(worker/draft.go)Two-phase mutation routing (
worker/mutation.go)dgraph.labelto buildentity → labelbatch mapresolveLabel(uid, batchLabels)— checks batch → cache → group-1 readdgraph.type,dgraph.label, ACL) still use predicate-level routingZero state machine (
dgraph/cmd/zero/raft.go,zero.go)handleTabletstores/looks up tablets using composite keysservingTablet(pred, label)— multiple groups CAN serve the same predicate with different labelsShouldServeupdated: when a group requests a predicate it already serves (under a different label), return the matching label tabletQuery fan-out (
worker/task.go,worker/sort.go)AllTablets(pred)returns all label variants of a predicate from the tablet cacheProcessTaskOverNetwork/SortOverNetwork: if multiple label tablets exist, dispatch to all groups in parallel and mergemergeResults/mergeSortResults: concatenate UID matrices, value matrices, counts; sort merged UIDs for downstream binary searchOracle commit validation (
dgraph/cmd/zero/oracle.go)ServingTablets(pred)returns all groups serving a predicateProposal validation (
worker/proposal.go)checkTabletswitched fromTablet(pred, label)toBelongsToReadOnly(pred, 0)which usesGetForGroup— avoids depending onschema.State().GetLabel()which returns a single label and may not match the receiving group's label tabletTabletIndexdata structure (protos/pb/labeled.go)map[pred]map[label]*Tabletfor O(1) lookupsServingTabletfallback loop scanning all tablets viaParseTabletKeyAllTabletswalking the entire flat map filtering by predicate prefixapplyStatetwo-pass group ordering to guarantee bare-key → own-group mappingNew files
protos/pb/tablet_index_test.goprotos/pb/labeled_test.goworker/entity_label_cache.goworker/entity_label_cache_test.goTest results
systest/labelintegration tests pass, including:TestEntityLevelRouting— 3 entities, 3 groups, fan-out query returns allTestLabelAuthNoLevel— regression test for fallback-freeServingTabletTestLabeledPredicateCannotBeMoved— labeled predicate stays pinnedDiff stats