Skip to content

fix: add EdgeType nodes to graph DB for GRAPH_COMPLETION search#2487

Open
soichisumi wants to merge 2 commits intotopoteretes:mainfrom
soichisumi:fix/index-graph-edges-add-nodes
Open

fix: add EdgeType nodes to graph DB for GRAPH_COMPLETION search#2487
soichisumi wants to merge 2 commits intotopoteretes:mainfrom
soichisumi:fix/index-graph-edges-add-nodes

Conversation

@soichisumi
Copy link
Copy Markdown
Contributor

@soichisumi soichisumi commented Mar 26, 2026

Problem

index_graph_edges() creates EdgeType datapoints and calls index_data_points(), but some graph adapters (specifically FalkorDB via cognee-community-hybrid-adapter-falkor) don't persist nodes through index_data_points(). This results in EdgeType entries existing without vector embeddings in the graph database.

The consequence is that GRAPH_COMPLETION triplet search fails silently because it cannot find EdgeType vectors to match against query embeddings.

Fix

Add an explicit graph_engine.add_nodes(edge_type_datapoints) call before index_data_points() in index_graph_edges(). This ensures EdgeType nodes are persisted with their vector embeddings in the graph database regardless of the adapter's index_data_points implementation.

Impact

  • Fixes GRAPH_COMPLETION search returning empty results on FalkorDB
  • No impact on adapters where index_data_points already persists nodes (e.g. Kuzu, LanceDB) — the add_nodes call is idempotent
  • The additional add_nodes call only runs when there are edge type datapoints to persist

Summary by CodeRabbit

  • Bug Fixes
    • Improved persistence of edge-type data to the graph database for more reliable storage and access across different adapters.
    • Ensured indexing proceeds only after edge data is persisted, reducing data loss or inconsistency.
    • Unified and clearer error reporting for failures during edge retrieval/transform/persist/index operations, aiding troubleshooting.

index_graph_edges() creates EdgeType datapoints and calls
index_data_points(), but some graph adapters (e.g. FalkorDB community
adapter) don't persist nodes via index_data_points. This means EdgeType
entries exist without vector embeddings in the graph, causing
GRAPH_COMPLETION triplet search to fail silently.

Add an explicit graph_engine.add_nodes() call before indexing to ensure
EdgeType nodes are persisted with their vector embeddings regardless of
the adapter implementation.
@pull-checklist
Copy link
Copy Markdown

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@github-actions
Copy link
Copy Markdown

Hello @soichisumi, thank you for submitting a PR! We will respond as soon as possible.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 42267c54-d8cf-45fe-908c-2dabccebe10c

📥 Commits

Reviewing files that changed from the base of the PR and between 2303443 and 1147549.

📒 Files selected for processing (1)
  • cognee/tasks/storage/index_graph_edges.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • cognee/tasks/storage/index_graph_edges.py

Walkthrough

index_graph_edges now creates EdgeType datapoints, lazily initializes graph_engine when needed, persists non-empty EdgeType datapoints via graph_engine.add_nodes(...) before indexing, and replaces the prior initialization-specific try/except with a single try/except that logs and raises RuntimeError("Graph edge indexing error") on any exception.

Changes

Cohort / File(s) Summary
EdgeType persistence & error handling
cognee/tasks/storage/index_graph_edges.py
Creates EdgeType datapoints, lazily initializes graph_engine (sets to None, obtains when needed), calls graph_engine.add_nodes(edge_type_datapoints) if the list is non-empty before index_data_points(...), and replaces the prior init-specific try/except with a unified try/except that logs and raises RuntimeError("Graph edge indexing error") on errors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • PR #251: Directly modifies the same index_graph_edges.py function to add graph database persistence of EdgeType datapoints.

Suggested labels

run-checks

Suggested reviewers

  • alekszievr
  • lxobr
  • dexters1
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description clearly explains the problem, fix, and impact, but the PR description template requires sections like Acceptance Criteria, Type of Change, Screenshots, and Pre-submission Checklist which are missing. Complete the PR description by adding all required template sections: Acceptance Criteria, Type of Change checkbox (Bug fix), Screenshots, Pre-submission Checklist, and DCO Affirmation.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding EdgeType nodes to the graph database to fix GRAPH_COMPLETION search functionality.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cognee/tasks/storage/index_graph_edges.py`:
- Around line 81-83: index_graph_edges currently calls get_graph_engine() and
graph_engine.add_nodes(...) outside the function's existing try/except and may
initialize the engine twice when edges_data is None; move the new write path
into the same guarded try/except used for the rest of the function, obtain the
engine once (reuse the same engine variable created earlier instead of calling
get_graph_engine() again), call add_nodes on that engine inside the try block,
and on exception use the same logging and RuntimeError wrapping behavior as the
existing error handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 58fbd4c6-09a5-4a7e-886c-bcdea32fa3c5

📥 Commits

Reviewing files that changed from the base of the PR and between 5469622 and 2303443.

📒 Files selected for processing (1)
  • cognee/tasks/storage/index_graph_edges.py

Address CodeRabbit feedback:
- Move add_nodes() call inside the existing try/except block
- Reuse graph_engine instance instead of calling get_graph_engine() twice
- Update error message to be more specific
@soichisumi
Copy link
Copy Markdown
Contributor Author

Update: v0.5.5 compatibility

I noticed that index_graph_edges.py was significantly rewritten in v0.5.5 (relevant changes):

  1. create_edge_type_datapoints() was extracted as a standalone helper function
  2. vector_engine parameter was added to index_graph_edges() signature
  3. index_data_points() now receives vector_engine=vector_engine kwarg

This PR's diff is based on the v0.5.3 code and no longer applies cleanly to main (v0.5.5+).

The root cause still exists in v0.5.5

The v0.5.5 index_graph_edges calls index_data_points(edge_type_datapoints, vector_engine=vector_engine), but for adapters where vector_engine.index_data_points() doesn't persist nodes to the graph DB (e.g., FalkorDB hybrid adapter), EdgeType nodes still won't have vector embeddings in the graph.

The fix remains the same: call graph_engine.add_nodes(edge_type_datapoints) before index_data_points(). I'll rebase this PR onto main to match the current code.

@soichisumi
Copy link
Copy Markdown
Contributor Author

Correction: I checked and this PR already applies cleanly to current main (v0.5.5) — no rebase needed. The diff correctly targets the rewritten index_graph_edges.py with create_edge_type_datapoints() and vector_engine parameter. Status is MERGEABLE with no conflicts.

@dexters1 dexters1 requested review from hajdul88 and lxobr March 27, 2026 13:45
Copy link
Copy Markdown
Collaborator

@lxobr lxobr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @soichisumi , thanks for catching this and putting together a fix, the root cause diagnosis is spot on.

One concern though: adding add_nodes in index_graph_edges introduces an extra graph DB call on every run for all adapters, even those that don't have this problem. And there's no clean way to tighten that check without coupling index_graph_edges to adapter-specific knowledge. That is something we would like to avoid, as the abstraction is there so that callers shouldn't need to know how a given adapter stores data.

However, the fix should be straightforward on the FalkorDB adapter side in https://github.com/topoteretes/cognee-community. Would you be up for making that change there?

@soichisumi
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback! I looked into this further, and I believe this gap affects nearly all adapter configurations, not just FalkorDB.

In add_data_points.py (L106-107), regular nodes go through two steps:

  await graph_engine.add_nodes(nodes)                            # persist to graph DB
  await index_data_points(nodes, vector_engine=vector_engine)    # embed in vector engine

But index_graph_edges (L75-76) only calls index_data_points for EdgeType — the add_nodes step is missing.
Since the core task calls the vector engine's index_data_points, only Neptune Analytics (whose hybrid create_data_points MERGEs into the graph as a side effect) ends up with EdgeType nodes in the graph DB.

Every other configuration — PGVector, ChromaDB, LanceDB, QDrant, DuckDB, and yes FalkorDB — is missing them, which breaks GRAPH_COMPLETION triplet search.

This PR just applies the same two-step pattern to EdgeType that regular nodes already use. add_nodes is MERGE-based (upsert), so it's idempotent on Neptune. I think one fix in core is cleaner than patching each adapter individually — what do you think? ( If I'm misreading the codebase, happy to be corrected! )
@lxobr

@lxobr
Copy link
Copy Markdown
Collaborator

lxobr commented Mar 30, 2026

Thanks for the follow up research @soichisumi! Will look into it and get back to you tomorrow.

@soichisumi
Copy link
Copy Markdown
Contributor Author

Thank you!

We could also move this to cognee-community.
I was just checking because I thought it might be cleaner to implement it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants