Skip to content

DO NOT MERGE — POC: entity-level routing with composite tablet keys and TabletIndex#9590

Closed
mlwelles wants to merge 21 commits intosharding-pocfrom
entity-extended-sharding-poc
Closed

DO NOT MERGE — POC: entity-level routing with composite tablet keys and TabletIndex#9590
mlwelles wants to merge 21 commits intosharding-pocfrom
entity-extended-sharding-poc

Conversation

@mlwelles
Copy link
Contributor

@mlwelles mlwelles commented Feb 5, 2026

⚠️ POC ONLY — NOT FOR PRODUCTION MERGE
Stacked on #9574 (sharding-poc). Exists to validate entity-level routing before any production work.

What sharding-poc (#9574) provides

The base branch implements schema-level label routing: a predicate declared with @label(secret) is pinned to the Alpha group running with --label=secret. One predicate → one group. It also adds program-based auth via dgraph.programs facets.

What this branch adds

Entity-level routing — the same predicate can now live on multiple groups simultaneously, one per label. Instead of the schema pinning an entire predicate to a group, individual entities are pinned via a dgraph.label value on their UID:

# No @label directives in schema — routing is per-entity, not per-predicate:
Document.name: string @index(term) .
Document.text: string @index(term) .

_:doc1 <dgraph.label> "secret" .
_:doc1 <Document.name> "Secret.pdf" .       # → routed to "secret" group
_:doc1 <Document.text> "Classified" .       # → routed to "secret" group

_:doc2 <Document.name> "Public.pdf" .       # → routed to unlabeled group

Zero's tablet state now contains composite keys for each (predicate, label) pair:

0-Document.name             → group 1 (unlabeled)
0-Document.name@secret      → group 2
0-Document.name@top_secret  → group 3

A query like has(Document.name) fans out to all three groups and merges the results.

Implementation breakdown

Composite tablet keys (protos/pb/labeled.go)

  • TabletKey(pred, label)"pred@label" (or bare "pred" when unlabeled)
  • ParseTabletKey(key) → splits back into (pred, label)
  • These are the encoding/decoding for the flat proto map<string, Tablet>

dgraph.label as a reserved predicate (schema/schema.go, x/keys.go)

  • Registered in initialSchemaInternal with string type + exact index
  • Added to starAllPredicateMap so * queries include it

Entity label cache (worker/entity_label_cache.go — new file)

  • Concurrency-safe UID → label map on Alpha (1M entries, ~16MB)
  • Avoids querying group 1 for every mutation to resolve an entity's label
  • Cleared on DropAll (worker/draft.go)

Two-phase mutation routing (worker/mutation.go)

  • Phase 1: Scan mutation edges for dgraph.label to build entity → label batch map
  • Phase 2: Route each edge using resolveLabel(uid, batchLabels) — checks batch → cache → group-1 read
  • Reserved predicates (dgraph.type, dgraph.label, ACL) still use predicate-level routing

Zero state machine (dgraph/cmd/zero/raft.go, zero.go)

  • handleTablet stores/looks up tablets using composite keys
  • Duplicate detection uses servingTablet(pred, label) — multiple groups CAN serve the same predicate with different labels
  • ShouldServe updated: when a group requests a predicate it already serves (under a different label), return the matching label tablet

Query fan-out (worker/task.go, worker/sort.go)

  • AllTablets(pred) returns all label variants of a predicate from the tablet cache
  • ProcessTaskOverNetwork / SortOverNetwork: if multiple label tablets exist, dispatch to all groups in parallel and merge
  • mergeResults / mergeSortResults: concatenate UID matrices, value matrices, counts; sort merged UIDs for downstream binary search

Oracle commit validation (dgraph/cmd/zero/oracle.go)

  • ServingTablets(pred) returns all groups serving a predicate
  • Commit validation checks if the committing group is any of them (not just the single tablet owner)

Proposal validation (worker/proposal.go)

  • checkTablet switched from Tablet(pred, label) to BelongsToReadOnly(pred, 0) which uses GetForGroup — avoids depending on schema.State().GetLabel() which returns a single label and may not match the receiving group's label tablet

TabletIndex data structure (protos/pb/labeled.go)

  • Nested map[pred]map[label]*Tablet for O(1) lookups
  • Replaces three O(n) flat-map patterns introduced by the composite key design:
    • ServingTablet fallback loop scanning all tablets via ParseTabletKey
    • AllTablets walking the entire flat map filtering by predicate prefix
    • applyState two-pass group ordering to guarantee bare-key → own-group mapping
  • Read cache rebuilt from authoritative flat proto maps after every write

New files

File Purpose
protos/pb/tablet_index_test.go TabletIndex unit tests (7 tests)
protos/pb/labeled_test.go TabletKey/ParseTabletKey unit tests
worker/entity_label_cache.go UID → label cache
worker/entity_label_cache_test.go Entity label cache tests

Test results

  • 7/7 TabletIndex unit tests pass
  • 27/27 systest/label integration tests pass, including:
    • TestEntityLevelRouting — 3 entities, 3 groups, fan-out query returns all
    • TestLabelAuthNoLevel — regression test for fallback-free ServingTablet
    • TestLabeledPredicateCannotBeMoved — labeled predicate stays pinned

Diff stats

20 files changed, 2569 insertions(+), 112 deletions(-)

Add IsLabeled() methods to Tablet, Member, and SchemaUpdate protobuf
types, replacing raw `Label != ""` checks with nil-safe, semantic
helpers. This centralizes the label-check logic and makes the intent
clearer at each call site.
Design document for extending label-based predicate routing to support
entity-level routing via dgraph.label. Key decisions:
- Sub-tablet keys: predicate@label (backward compat for unlabeled)
- Entity label stored on group 1 as reserved predicate
- Two-phase mutation routing (extract labels, then route)
- Query fan-out to all authorized sub-tablets
- Synchronous reclassification following predicate-move pattern
- Entity label > predicate @Label > unlabeled priority
13-task implementation plan for entity-level sub-tablet routing,
covering composite tablet keys, Zero state machine changes,
two-phase mutation routing, query fan-out, entity label cache,
and integration tests.
Switch handleTablet from using bare tablet.Predicate as the map key
to using pb.TabletKey(predicate, label) composite keys. This enables
multiple groups to serve the same predicate with different labels
(sub-tablets). Duplicate detection now uses servingSubTablet to check
for (predicate, label) pair conflicts instead of predicate-only conflicts.
Add TestEntityLevelRouting that verifies entity-level sub-tablet routing
works end-to-end: setting dgraph.label on a UID pins all its predicates
to the labeled group via composite tablet keys (predicate@label) in Zero
state, and queries fan out across all sub-tablet groups to return
complete results.
… routing

The TestEntityLevelRouting integration test was failing non-deterministically
(returning 0-1 of 3 expected documents). Root cause: mergeResults appended
UIDs from fan-out goroutines in non-deterministic order, but downstream
algo.IndexOf uses binary search assuming sorted UIDs. The JSON encoder in
preTraverse then silently skipped UIDs that binary search couldn't find.

Key fixes:
- Sort merged UidMatrix entries in mergeResults after appending from fan-out
- Two-pass tablet caching in applyState (other groups first, own group last)
  so bare-predicate aliases correctly map to the own group's sub-tablet
- Store tablets under composite keys (pred@label) in BelongsToReadOnly,
  sendTablet, and Inform to preserve AllSubTablets canonical entry discovery
- Use BelongsToReadOnly in checkTablet (proposal.go) instead of Tablet+label
  to avoid label-resolution mismatch on entity-level sub-tablets
- Fix Zero's ServingTablet/ShouldServe/Inform to use composite keys and
  sub-tablet fallback search for entity-level routing
- Fix commit validation (oracle.go) to check all sub-tablets via ServingTablets
- Simplify resolveLabel to entity-only (predicate @Label handled by Zero)
- Update test assertions for composite tablet keys, add retry logic, sort
  query results in Go instead of using DQL orderasc (avoids sort triplication)

All 27 label integration tests pass.
…O(1) lookups

Introduce a TabletIndex type (map[pred]map[label]*Tablet) that replaces
three O(n) patterns with O(1) nested lookups:

- ServingTablet fallback loop that scanned all tablets via ParseTabletKey
- AllSubTablets scan that iterated entire flat maps to find same-predicate tablets
- applyState two-pass aliasing that required processing groups in specific order

The index is always rebuilt from the authoritative flat proto maps (in
regenerateChecksum on Zero, applyState on Alpha), preserving the wire format
as the single source of truth.

Key methods: Get, Set, GetAny (prefers unlabeled), GetForGroup (prefers
caller's group), AllForPredicate (O(1) label map), BuildFromFlat (bridge
from proto).

Also renames "sub-tablet" terminology to "label tablet" throughout.
@mlwelles mlwelles changed the title DO NOT MERGE — POC: replace flat tablet maps with nested TabletIndex DO NOT MERGE — POC: entity-level routing with composite tablet keys and TabletIndex Feb 5, 2026
Clean up tablet-related function names and comments to use simpler,
more consistent naming. Remove unused public method ServingLabelTablet.
@xqqp
Copy link
Contributor

xqqp commented Feb 6, 2026

I appreciate the progress made on sharding predicates among alphas. However, I feel that both pinning predicates and entity-level routing are merely workarounds for issues that Dgraph should resolve independently. Ideally, Dgraph would automatically detect when a predicate has grown too large on an alpha and then move a portion of that predicate to another alpha, without requiring intervention from a database administrator.

@matthewmcneely
Copy link
Contributor

@xqqp Note this branch is a proof of concept. You should not take anything happening in this branch as future direction.

Ideally, Dgraph would automatically detect when a predicate has grown too large on an alpha and then move a portion of that predicate to another alpha, without requiring intervention from a database administrator.

What you describe here is, in essence, the current functionality of group rebalancing in Dgraph and has been for years.

@xqqp
Copy link
Contributor

xqqp commented Feb 6, 2026

@matthewmcneely

Note this branch is a proof of concept. You should not take anything happening in this branch as future direction.

I'm aware of that. Just thought I give some feedback.

What you describe here is, in essence, the current functionality of group rebalancing in Dgraph and has been for years.

In which release was that implemented? If I'm reading this correctly at least in December 2024 predicates couldn't be split: https://discuss.dgraph.io/t/does-dgraph-have-a-scalability-problems-with-graphs-having-single-heavy-predicate/19647/2

@matthewmcneely
Copy link
Contributor

Right, sharding is still per predicate. When I said "in essense", I was referring to the rebalancing based on disk consumption per-predicate.

@xqqp
Copy link
Contributor

xqqp commented Feb 6, 2026

Alright, then we are on the same page.

@mlwelles
Copy link
Contributor Author

Closing to prevent confusion, as it's not intended as anything but a POC of feasibility.

@mlwelles mlwelles closed this Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants