Skip to content

fix: use dict spread to avoid shallow copy mutation in index_data_points#2530

Open
nightcityblade wants to merge 1 commit intotopoteretes:devfrom
nightcityblade:fix/issue-2529
Open

fix: use dict spread to avoid shallow copy mutation in index_data_points#2530
nightcityblade wants to merge 1 commit intotopoteretes:devfrom
nightcityblade:fix/issue-2529

Conversation

@nightcityblade
Copy link
Copy Markdown

@nightcityblade nightcityblade commented Mar 31, 2026

Description

When a DataPoint has multiple index_fields, the for loop in index_data_points() only processes the first field. This happens because model_copy() performs a shallow copy, so indexed_data_point.metadata and data_point.metadata reference the same dict. Setting indexed_data_point.metadata["index_fields"] = [field_name] mutates the original, truncating the iteration list.

The fix replaces the direct dict mutation with a dict spread ({**data_point.metadata, "index_fields": [field_name]}), creating a new dict for each copy so the original is never modified.

Acceptance Criteria

  • Multiple index_fields are all processed and embedded correctly
  • Original data_point.metadata["index_fields"] is not mutated during iteration

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Code refactoring
  • Other (please specify):

Screenshots

N/A — single-line logic fix verified via manual simulation.

Pre-submission Checklist

  • I have tested my changes thoroughly before submitting this PR
  • This PR contains minimal changes necessary to address the issue/feature
  • My code follows the project's coding standards and style guidelines
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if applicable)
  • All new and existing tests pass
  • I have searched existing PRs to ensure this change hasn't been submitted already
  • I have linked any relevant issues in the description
  • My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Fixes #2529

Summary by CodeRabbit

  • Bug Fixes
    • Fixed metadata preservation during data point indexing to ensure all metadata fields are properly retained while correctly managing index field information.

Replace direct mutation of shared metadata dict with a new dict
via spread operator. This prevents the for-loop from being truncated
when iterating over multiple index_fields, as the original
data_point.metadata is no longer mutated by copies.

Fixes topoteretes#2529
@pull-checklist
Copy link
Copy Markdown

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@github-actions
Copy link
Copy Markdown

Hello @nightcityblade, thank you for submitting a PR! We will respond as soon as possible.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 94d32069-3fbb-4fb6-a761-3ff1e3ede4c0

📥 Commits

Reviewing files that changed from the base of the PR and between 4dff60e and 71c73f1.

📒 Files selected for processing (1)
  • cognee/tasks/storage/index_data_points.py

Walkthrough

Fixed a shallow copy metadata mutation bug in the index data points task. Replaced direct field assignment with dictionary unpacking to ensure each indexed data point receives a proper copy of metadata while maintaining only the relevant index field.

Changes

Cohort / File(s) Summary
Metadata Copy Fix
cognee/tasks/storage/index_data_points.py
Changed metadata assignment from mutating the shallow-copied dict to using dictionary unpacking ({**data_point.metadata, "index_fields": [field_name]}) to create an isolated metadata object per indexed copy, preventing truncation of the original index_fields list during iteration.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: using dict spread operator to avoid shallow copy mutation in index_data_points function.
Description check ✅ Passed The PR description comprehensively explains the bug, root cause, fix approach, acceptance criteria, and includes all required template sections with proper checkbox completion.
Linked Issues check ✅ Passed The code change directly addresses issue #2529 by implementing dict spread to create new metadata per copy, preventing mutation of the original index_fields list during iteration.
Out of Scope Changes check ✅ Passed The single-line change is tightly scoped to fixing the shallow copy mutation issue in index_data_points.py with no extraneous modifications.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dexters1 dexters1 self-requested a review March 31, 2026 15:25
Copy link
Copy Markdown
Collaborator

@dexters1 dexters1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is made towards wrong branch. Check CONTRIBUTING.md

https://github.com/topoteretes/cognee/blob/main/CONTRIBUTING.md

@nightcityblade nightcityblade changed the base branch from main to dev April 1, 2026 03:01
@nightcityblade
Copy link
Copy Markdown
Author

Thanks @dexters1! Updated the base branch to dev as per CONTRIBUTING.md. Sorry about that!

@nightcityblade
Copy link
Copy Markdown
Author

Thanks for the feedback! The PR currently targets dev as the base branch, which aligns with CONTRIBUTING.md's guidance to branch from dev. Could you clarify which branch I should target instead? Happy to update it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]index_data_points: shallow copy of metadata dict causes only first index_field to be embedded

2 participants