BE-456: HashQL: Entity type model enrichment and graph store query extensions#8524
Conversation
feat: checkpoint (II) feat: checkpoint (III) feat: snapshot vec feat: add dedicated filter feat: checkpoint feat: filter implementation feat: filter implementation (mostly) done chore: environment capture note chore: always postgres bigint feat: target clone feat: simplify lookup feat: move storage up feat: eval entity path chore: checkpoint chore: checkpoint chore: find entrypoint feat: eval context feat: eval cleanup chore: cleanup feat: track index feat: wire up filter feat: add error reporting chore: checkpoint feat: add traverse, and first postgres compiler outline feat: traverse bitmap feat: move traversal out feat: projections feat: projections fix: clippy feat: subquery projection for lateral feat: checkpoint feat: test plan feat: checkpoint feat: checkpoint – failing tests ;-; feat: checkpoint – failing tests ;-; feat: checkpoint — passing tests fix: import fix: entity type feat: checkpoint feat: attribute a cost to terminator placement switches fix: import feat: checkpoint feat: checkpoint chore: lint
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
PR SummaryMedium Risk Overview Extends the graph Postgres SQL builder with new expression capabilities and typing: Enriches the HashQL Written by Cursor Bugbot for commit 61d002f. This will update automatically on new commits. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Misleading enum variant name for
jsonb_aggfunction- Renamed
Function::JsonAggtoFunction::JsonbAggand updated all internal usages/tests to match the emittedjsonb_agg(...)SQL.
- Renamed
- ✅ Fixed: Multiple new symbols defined but never referenced
- Removed the eight unreferenced symbol definitions from
sym.rsto eliminate dead entries and duplicate provenance naming.
- Removed the eight unreferenced symbol definitions from
Or push these changes by commenting:
@cursor push 928f4a2c0b
Preview (928f4a2c0b)
diff --git a/libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs b/libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs
--- a/libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs
+++ b/libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs
@@ -17,7 +17,7 @@
pub enum Function {
Min(Box<Expression>),
Max(Box<Expression>),
- JsonAgg(Box<Expression>),
+ JsonbAgg(Box<Expression>),
JsonExtractText(Box<Expression>),
JsonExtractAsText(Box<Expression>, PathToken<'static>),
JsonExtractPath(Vec<Expression>),
@@ -55,7 +55,7 @@
expression.transpile(fmt)?;
fmt.write_char(')')
}
- Self::JsonAgg(expression) => {
+ Self::JsonbAgg(expression) => {
fmt.write_str("jsonb_agg(")?;
expression.transpile(fmt)?;
fmt.write_char(')')
@@ -821,9 +821,9 @@
}
#[test]
- fn transpile_json_agg() {
+ fn transpile_jsonb_agg() {
assert_eq!(
- Expression::Function(Function::JsonAgg(Box::new(Expression::Parameter(1))))
+ Expression::Function(Function::JsonbAgg(Box::new(Expression::Parameter(1))))
.transpile_to_string(),
"jsonb_agg($1)"
);
diff --git a/libs/@local/hashql/core/src/symbol/sym.rs b/libs/@local/hashql/core/src/symbol/sym.rs
--- a/libs/@local/hashql/core/src/symbol/sym.rs
+++ b/libs/@local/hashql/core/src/symbol/sym.rs
@@ -36,7 +36,6 @@
entity,
entity_edition_id,
entity_id,
- entity_type_ids,
entity_uuid,
eq,
Err,
@@ -55,8 +54,6 @@
left_entity_confidence,
left_entity_id,
left_entity_provenance,
- left_entity_uuid,
- left_entity_web_id,
link_data,
List,
lt,
@@ -76,10 +73,7 @@
or,
pow,
properties,
- property_metadata,
provenance,
- provenance_edition,
- provenance_inferred,
provided,
r#as: "as",
r#as_force: "as!",
@@ -101,8 +95,6 @@
right_entity_confidence,
right_entity_id,
right_entity_provenance,
- right_entity_uuid,
- right_entity_web_id,
Some,
special_form,
String,
libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs
Show resolved
Hide resolved
libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs
Fixed
Show fixed
Hide fixed
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## bm/be-429-hashql-per-island-entitypath-requirements-and-fetchisland #8524 +/- ##
=======================================================================================================
- Coverage 62.76% 61.59% -1.18%
=======================================================================================================
Files 1326 915 -411
Lines 135528 98846 -36682
Branches 5523 4244 -1279
=======================================================================================================
- Hits 85065 60880 -24185
+ Misses 49549 37349 -12200
+ Partials 914 617 -297 Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
🤖 Augment PR SummarySummary: This PR enriches the HashQL entity type model to better reflect the full graph database schema, and extends the Postgres query-builder/AST to support additional SQL constructs needed by the HashQL→Postgres compiler. Changes:
Technical Notes: Query builder aliasing is standardized on an 🤖 Was this summary useful? React with 👍 or 👎 |
libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs
Show resolved
Hide resolved
Merging this PR will degrade performance by 19.34%
Performance Changes
Comparing Footnotes |
6ef6d13 to
ef9b858
Compare
ef9b858 to
a91293c
Compare
libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs
Show resolved
Hide resolved
922e42b to
572197c
Compare
a91293c to
43775ba
Compare
43775ba to
3ba579f
Compare
572197c to
74ce977
Compare
74ce977 to
b8fe72b
Compare
3ba579f to
61d002f
Compare
|
Deployment failed with the following error: |
|
|
||
| // Library Features | ||
| iterator_try_collect, | ||
| assert_matches, |
Check warning
Code scanning / clippy
the feature assert_matches has been stable since 1.95.0 and no longer requires an attribute to enable Warning
| // Library Features | ||
| iterator_try_collect, | ||
| assert_matches, | ||
| allocator_api, |
Check warning
Code scanning / clippy
feature allocator_api is declared but not used Warning
| iterator_try_collect, | ||
| assert_matches, | ||
| allocator_api, | ||
| iter_array_chunks, |
Check warning
Code scanning / clippy
feature iter_array_chunks is declared but not used Warning
| assert_matches, | ||
| allocator_api, | ||
| iter_array_chunks, | ||
| maybe_uninit_fill |
Check warning
Code scanning / clippy
feature maybe_uninit_fill is declared but not used Warning
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix prepared fixes for all 3 issues found in the latest run.
- ✅ Fixed: Snapshot order not actually enforced
- I added per-snapshot IDs with stack validation so commit/rollback now panic on out-of-order snapshot consumption.
- ✅ Fixed: UNNEST now allows invalid empty call
- I added an explicit non-empty assertion for
Function::Unnestand a regression test to prevent emitting invalidUNNEST()SQL.
- I added an explicit non-empty assertion for
- ✅ Fixed: Snapshot tokens not bound to vector
- I bound snapshots to their originating
IdSnapshotVecvia an owner ID check so foreign snapshots are rejected.
- I bound snapshots to their originating
Or push these changes by commenting:
@cursor push 3658d0ed67
Preview (3658d0ed67)
diff --git a/libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs b/libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs
--- a/libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs
+++ b/libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs
@@ -123,6 +123,10 @@
fmt.write_char(')')
}
Self::Unnest(expression) => {
+ assert!(
+ !expression.is_empty(),
+ "UNNEST requires at least one argument"
+ );
fmt.write_str("UNNEST(")?;
for (index, element) in expression.iter().enumerate() {
@@ -841,6 +845,12 @@
}
#[test]
+ #[should_panic(expected = "UNNEST requires at least one argument")]
+ fn transpile_unnest_empty_panics() {
+ let _ = Expression::Function(Function::Unnest(vec![])).transpile_to_string();
+ }
+
+ #[test]
fn transpile_field_access() {
assert_eq!(
Expression::FieldAccess {
diff --git a/libs/@local/hashql/core/src/id/snapshot_vec.rs b/libs/@local/hashql/core/src/id/snapshot_vec.rs
--- a/libs/@local/hashql/core/src/id/snapshot_vec.rs
+++ b/libs/@local/hashql/core/src/id/snapshot_vec.rs
@@ -37,10 +37,19 @@
hash::{Hash, Hasher},
mem,
ops::Deref,
+ sync::atomic::{AtomicU64, Ordering},
};
use super::{Id, IdVec, slice::IdSlice};
+static NEXT_SNAPSHOT_OWNER_ID: AtomicU64 = AtomicU64::new(1);
+
+fn next_snapshot_owner_id() -> u64 {
+ let owner_id = NEXT_SNAPSHOT_OWNER_ID.fetch_add(1, Ordering::Relaxed);
+ assert!(owner_id != 0, "snapshot owner id exhausted");
+ owner_id
+}
+
/// Defines how custom mutations are undone during rollback.
///
/// Built-in implementations:
@@ -77,34 +86,56 @@
struct Log<I, T, S: UndoStrategy<I, T>, A: Allocator = Global> {
tape: Vec<LogEntry<I, T, S, A>, A>,
- open: usize,
+ open_snapshots: Vec<u64, A>,
+ owner_id: u64,
+ next_snapshot_id: u64,
}
impl<I, T, S: UndoStrategy<I, T>> Log<I, T, S> {
const fn new() -> Self {
Self {
tape: Vec::new(),
- open: 0,
+ open_snapshots: Vec::new(),
+ owner_id: 0,
+ next_snapshot_id: 1,
}
}
}
impl<I, T, S: UndoStrategy<I, T>, A: Allocator> Log<I, T, S, A> {
- const fn new_in(alloc: A) -> Self {
+ fn new_in(alloc: A) -> Self
+ where
+ A: Clone,
+ {
Self {
- tape: Vec::new_in(alloc),
- open: 0,
+ tape: Vec::new_in(alloc.clone()),
+ open_snapshots: Vec::new_in(alloc),
+ owner_id: 0,
+ next_snapshot_id: 1,
}
}
const fn recording(&self) -> bool {
- self.open > 0
+ !self.open_snapshots.is_empty()
}
- const fn start(&mut self) -> Snapshot {
- self.open += 1;
+ fn start(&mut self) -> Snapshot {
+ if self.owner_id == 0 {
+ self.owner_id = next_snapshot_owner_id();
+ }
+
+ let tape_len = self.tape.len();
+ let snapshot_id = self.next_snapshot_id;
+ self.next_snapshot_id = self
+ .next_snapshot_id
+ .checked_add(1)
+ .expect("snapshot id exhausted");
+ self.open_snapshots.push(snapshot_id);
+
Snapshot {
- tape_len: self.tape.len(),
+ tape_len,
+ owner_id: self.owner_id,
+ snapshot_id,
}
}
@@ -114,7 +145,9 @@
I: Id,
{
self.assert(&snapshot);
- self.open -= 1;
+ self.open_snapshots
+ .pop()
+ .expect("snapshot presence verified by assert");
while self.tape.len() > snapshot.tape_len {
let entry = self.tape.pop().expect("tape length verified by loop guard");
@@ -144,29 +177,49 @@
#[expect(clippy::needless_pass_by_value)]
fn commit(&mut self, snapshot: Snapshot) {
self.assert(&snapshot);
- self.open -= 1;
+ self.open_snapshots
+ .pop()
+ .expect("snapshot presence verified by assert");
- if self.open == 0 {
+ if self.open_snapshots.is_empty() {
self.tape.clear();
}
}
fn assert(&self, snapshot: &Snapshot) {
assert!(
- self.tape.len() >= snapshot.tape_len,
+ !self.open_snapshots.is_empty(),
+ "no open snapshot to commit or rollback"
+ );
+ assert!(
+ snapshot.owner_id == self.owner_id,
+ "snapshot does not belong to this IdSnapshotVec"
+ );
+ assert!(
+ snapshot.tape_len
+ <= self.tape.len(),
"snapshot tape_len exceeds current tape length"
);
- assert!(self.open > 0, "no open snapshot to commit or rollback");
+ assert!(
+ snapshot.snapshot_id
+ == *self
+ .open_snapshots
+ .last()
+ .expect("snapshot presence verified by previous assert"),
+ "snapshot consumed out of stack order"
+ );
}
}
/// Opaque snapshot token for [`IdSnapshotVec`].
///
-/// Must be consumed by [`IdSnapshotVec::rollback_to`] or [`IdSnapshotVec::commit`] in stack
-/// (LIFO) order. Dropping a `Snapshot` without consuming it will not undo changes, but
-/// subsequent snapshot operations may panic.
+/// Must be consumed by [`IdSnapshotVec::rollback_to`] or [`IdSnapshotVec::commit`] on the same
+/// vector instance and in stack (LIFO) order. Dropping a `Snapshot` without consuming it will not
+/// undo changes, but subsequent snapshot operations may panic.
pub struct Snapshot {
tape_len: usize,
+ owner_id: u64,
+ snapshot_id: u64,
}
/// A snapshot-capable vector that uses typed IDs for indexing.
@@ -497,7 +550,7 @@
/// assert_eq!(vec.len(), 1);
/// ```
#[inline]
- pub const fn snapshot(&mut self) -> Snapshot {
+ pub fn snapshot(&mut self) -> Snapshot {
self.log.start()
}
@@ -782,10 +835,36 @@
let snap = vec.snapshot();
vec.commit(snap);
- vec.commit(Snapshot { tape_len: 0 });
+ vec.commit(Snapshot {
+ tape_len: 0,
+ owner_id: vec.log.owner_id,
+ snapshot_id: vec.log.next_snapshot_id,
+ });
}
#[test]
+ #[should_panic(expected = "snapshot consumed out of stack order")]
+ fn out_of_order_snapshot_consumption_panics() {
+ let mut vec = IdSnapshotVec::<TestId, i32>::new();
+ vec.push(1);
+ let outer = vec.snapshot();
+ let _inner = vec.snapshot();
+
+ vec.commit(outer);
+ }
+
+ #[test]
+ #[should_panic(expected = "snapshot does not belong to this IdSnapshotVec")]
+ fn foreign_snapshot_panics() {
+ let mut first = IdSnapshotVec::<TestId, i32>::new();
+ let mut second = IdSnapshotVec::<TestId, i32>::new();
+ let foreign = first.snapshot();
+ let _local = second.snapshot();
+
+ second.commit(foreign);
+ }
+
+ #[test]
fn empty_snapshot_roundtrip() {
let mut vec = IdSnapshotVec::<TestId, i32>::new();This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
| "snapshot tape_len exceeds current tape length" | ||
| ); | ||
| assert!(self.open > 0, "no open snapshot to commit or rollback"); | ||
| } |
There was a problem hiding this comment.
Snapshot order not actually enforced
High Severity
IdSnapshotVec::rollback_to and IdSnapshotVec::commit claim LIFO-only snapshots, but Log::assert only checks tape_len and open > 0. Out-of-order consumption can succeed, leaving open and tape inconsistent. This can keep recording() enabled without a valid snapshot and cause later incorrect rollbacks or delayed panics.
Additional Locations (2)
| /// subsequent snapshot operations may panic. | ||
| pub struct Snapshot { | ||
| tape_len: usize, | ||
| } |
There was a problem hiding this comment.
Snapshot tokens not bound to vector
Medium Severity
Snapshot is an unscoped token (tape_len only), and rollback_to/commit accept it without checking origin. A snapshot from one IdSnapshotVec can be consumed by another if its state passes assertions, which can close or roll back the wrong transaction state and leave the original vector with an unconsumed snapshot.
Additional Locations (2)
|
|
||
| element.transpile(fmt)?; | ||
| } | ||
|
|
There was a problem hiding this comment.
UNNEST now allows invalid empty call
Medium Severity
Function::Unnest now accepts Vec<Expression> and transpiles whatever length is provided. When the vector is empty, it emits UNNEST(), which is invalid SQL in PostgreSQL. The prior Box<Expression> API guaranteed at least one argument, so this introduces a new runtime query-failure path.




🌟 What is the purpose of this PR?
Enriches the HashQL entity type model to match the full database schema, extends the graph-store SQL query builder with new expression types needed by the postgres compiler, and adds a database migration for the
continuationcomposite type. This is the foundation layer for the postgres translation stack (BE-457, BE-306).🔍 What does this change?
hashql-core:
IdSnapshotVec, a typed-ID-indexed vector with transactional snapshot/rollback support (used by the postgres compiler for scoped state)TemporalInterval,EntityTemporalMetadata,Confidence, provenance types,EntityMetadata, andLinkDatafields to match the full database schema. Previously onlyEntityId,EntityRecordId, and partialLinkDatawere modeled.edition_idandontology_type_idssymbolsgraph-postgres-store query builder:
FieldAccess(composite field extraction),JsonAgg,Constant::Null,Constant::U128,UnaryOperator::IsNotFalseJsonB,Continuation,Numeric,Int,BigIntUnnestnow accepts multiple expressions (UNNEST(a, b, c))SelectStatementgainsOFFSETsupport.cast(),.not()and.grouped()now takeself&'static strtoIdentifierfor consistencymod postgresandmod tablemadepubfor use by the HashQL postgres compilerDatabase migration (v010__query):
continuationcomposite type:(filter boolean, block int, locals int[], values jsonb[]). This is the row type returned by compiled filter subqueries; the interpreter reads.filterto decide row inclusion and.block/.locals/.valuesto resume multi-block CFGs.hashql-eval:
graph/read/path.rsto handle the expanded entity field structurelib.rsanderror.rsfor the postgres module (filled in by BE-306)Test output updates:
.jsonc/.stdout/.stderrfixtures across eval (graph/read), HIR (lower/checking, specialization, graph-hoisting), and MIR (reify, inline) test suites, all cascading from the entity type changesPre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
This PR:
📜 Does this require a change to the docs?
The changes in this PR:
🕸️ Does this require a change to the Turbo Graph?
The changes in this PR:
The entity type model still does not cover every field in the database schema (e.g.,
property_metadatainner structure is opaque). Fields are added as needed by the postgres compiler; remaining gaps will be filled incrementally.🛡 What tests cover this?
IdSnapshotVechas doc tests and inline unit tests covering push/set/rollback/nested snapshotsFieldAccess,JsonAgg,Null,U128,IsNotFalse,UNNESTwith multiple args, cast types, andOFFSET❓ How to test this?