fix: add EdgeType nodes to graph DB for GRAPH_COMPLETION search#2487
fix: add EdgeType nodes to graph DB for GRAPH_COMPLETION search#2487soichisumi wants to merge 2 commits intotopoteretes:mainfrom
Conversation
index_graph_edges() creates EdgeType datapoints and calls index_data_points(), but some graph adapters (e.g. FalkorDB community adapter) don't persist nodes via index_data_points. This means EdgeType entries exist without vector embeddings in the graph, causing GRAPH_COMPLETION triplet search to fail silently. Add an explicit graph_engine.add_nodes() call before indexing to ensure EdgeType nodes are persisted with their vector embeddings regardless of the adapter implementation.
Please make sure all the checkboxes are checked:
|
|
Hello @soichisumi, thank you for submitting a PR! We will respond as soon as possible. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Walkthrough
Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cognee/tasks/storage/index_graph_edges.py`:
- Around line 81-83: index_graph_edges currently calls get_graph_engine() and
graph_engine.add_nodes(...) outside the function's existing try/except and may
initialize the engine twice when edges_data is None; move the new write path
into the same guarded try/except used for the rest of the function, obtain the
engine once (reuse the same engine variable created earlier instead of calling
get_graph_engine() again), call add_nodes on that engine inside the try block,
and on exception use the same logging and RuntimeError wrapping behavior as the
existing error handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 58fbd4c6-09a5-4a7e-886c-bcdea32fa3c5
📒 Files selected for processing (1)
cognee/tasks/storage/index_graph_edges.py
Address CodeRabbit feedback: - Move add_nodes() call inside the existing try/except block - Reuse graph_engine instance instead of calling get_graph_engine() twice - Update error message to be more specific
Update: v0.5.5 compatibilityI noticed that
This PR's diff is based on the v0.5.3 code and no longer applies cleanly to The root cause still exists in v0.5.5The v0.5.5 The fix remains the same: call |
|
Correction: I checked and this PR already applies cleanly to current |
lxobr
left a comment
There was a problem hiding this comment.
Hey @soichisumi , thanks for catching this and putting together a fix, the root cause diagnosis is spot on.
One concern though: adding add_nodes in index_graph_edges introduces an extra graph DB call on every run for all adapters, even those that don't have this problem. And there's no clean way to tighten that check without coupling index_graph_edges to adapter-specific knowledge. That is something we would like to avoid, as the abstraction is there so that callers shouldn't need to know how a given adapter stores data.
However, the fix should be straightforward on the FalkorDB adapter side in https://github.com/topoteretes/cognee-community. Would you be up for making that change there?
|
Thanks for the feedback! I looked into this further, and I believe this gap affects nearly all adapter configurations, not just FalkorDB. In add_data_points.py (L106-107), regular nodes go through two steps: But index_graph_edges (L75-76) only calls index_data_points for EdgeType — the add_nodes step is missing. Every other configuration — PGVector, ChromaDB, LanceDB, QDrant, DuckDB, and yes FalkorDB — is missing them, which breaks GRAPH_COMPLETION triplet search. This PR just applies the same two-step pattern to EdgeType that regular nodes already use. add_nodes is MERGE-based (upsert), so it's idempotent on Neptune. I think one fix in core is cleaner than patching each adapter individually — what do you think? ( If I'm misreading the codebase, happy to be corrected! ) |
|
Thanks for the follow up research @soichisumi! Will look into it and get back to you tomorrow. |
|
Thank you! We could also move this to cognee-community. |
Problem
index_graph_edges()createsEdgeTypedatapoints and callsindex_data_points(), but some graph adapters (specifically FalkorDB viacognee-community-hybrid-adapter-falkor) don't persist nodes throughindex_data_points(). This results in EdgeType entries existing without vector embeddings in the graph database.The consequence is that
GRAPH_COMPLETIONtriplet search fails silently because it cannot find EdgeType vectors to match against query embeddings.Fix
Add an explicit
graph_engine.add_nodes(edge_type_datapoints)call beforeindex_data_points()inindex_graph_edges(). This ensures EdgeType nodes are persisted with their vector embeddings in the graph database regardless of the adapter'sindex_data_pointsimplementation.Impact
GRAPH_COMPLETIONsearch returning empty results on FalkorDBindex_data_pointsalready persists nodes (e.g. Kuzu, LanceDB) — theadd_nodescall is idempotentadd_nodescall only runs when there are edge type datapoints to persistSummary by CodeRabbit