Feature/cog 4596 expand eval framework with gtrl corpus builder by ArmagedonFlamer · Pull Request #2534 · topoteretes/cognee

ArmagedonFlamer · 2026-04-01T14:39:22Z

Description

Acceptance Criteria

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Code refactoring
Other (please specify):

Screenshots

Pre-submission Checklist

I have tested my changes thoroughly before submitting this PR (See CONTRIBUTING.md)
This PR contains minimal changes necessary to address the issue/feature
My code follows the project's coding standards and style guidelines
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if applicable)
All new and existing tests pass
I have searched existing PRs to ensure this change hasn't been submitted already
I have linked any relevant issues in the description
My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Summary by CodeRabbit

Release Notes

New Features
- Added a logistics simulation benchmark adapter with automated world generation including users, retailers, post offices, and carriers
- Automatic evaluation of package delivery scenarios with carrier selection, estimated delivery times, and transport cost analysis
- Golden answer generation for benchmark evaluation datasets
- Narrative corpus generation from logistics simulation data
- Support for filtering evaluation instances and deterministic sampling
Tests
- Added test coverage for new logistics benchmark adapter and corpus building workflow

…ance-weighted-ingestion

…s-to-use-importance-weighted

…ted-ingestion' into feature/cog-4370-engineering-update-retrievers-to-use-importance-weighted

…tions example

…ted-ingestion' into feature/cog-4370-engineering-update-retrievers-to-use-importance-weighted

…ance-weighted-ingestion

…s-to-use-importance-weighted

…ted-ingestion' into feature/cog-4370-engineering-update-retrievers-to-use-importance-weighted

…s-to-use-importance-weighted

pull-checklist · 2026-04-01T14:39:27Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

coderabbitai · 2026-04-01T14:39:29Z

Walkthrough

This pull request introduces a comprehensive logistics simulation benchmark adapter for the evaluation framework. It adds a LogisticsSystemAdapter class that generates synthetic logistics worlds with entities (users, retailers, post offices, carriers, packages), evaluates delivery options through a rule engine, and provides narrativized corpus data for benchmarking.

Changes

Cohort / File(s)	Summary
Adapter Framework Updates `cognee/eval_framework/benchmark_adapters/base_benchmark_adapter.py`, `cognee/eval_framework/benchmark_adapters/benchmark_adapters.py`	Added async `prepare_corpus()` method to base adapter class and registered new `LogisticsSystemAdapter` in the `BenchmarkAdapter` enum.
Core Logistics Adapter `cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py`	Implements main adapter providing world creation, corpus loading with optional filtering/sampling, and golden answer generation for logistics benchmark.
Logistics Domain Entities `cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/enums.py`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/post_office.py`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/retailer.py`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/transportation.py`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/user.py`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/delivery_models.py`	Defines enum types (UserTier, ShippingRange, Region, TransportMode, etc.) and dataclasses modeling logistics domain entities (Package, PostOffice, Retailer, User, Carrier, DeliveryPlan) with label conversions and validation helpers.
World Generation and Rule Engine `cognee/eval_framework/benchmark_adapters/logistics_system_utils/ontology.py`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/rule_engine.py`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/utils.py`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py`	Implements world creation with entities and packages, delivery plan evaluation engine with scoring/validation, world serialization/deserialization, and deterministic world seeding and rendering.
Narrativization Pipeline `cognee/eval_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.py`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/prompts/narrativization.txt`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/prompts/package_narrativization.txt`	Generates LLM-based narratives from logistics world entities with fallback deterministic generation, supporting structured output queries and per-entity/package narrative file persistence.
Evaluation Rules and Queries `cognee/eval_framework/benchmark_adapters/logistics_system_utils/rules/rules_common_language.txt`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/rules/rules_policy_language.txt`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/queries/query.txt`, `cognee/eval_framework/benchmark_adapters/logistics_system_utils/queries/query_data_sources.txt`	Defines logistics evaluation rules covering carrier eligibility, delivery timing/pricing calculations, and query templates for generating QA pairs with supporting facts.
Corpus Builder Integration `cognee/eval_framework/corpus_builder/corpus_builder_executor.py`	Updated to call `prepare_corpus()` on adapters before loading and to deduplicate corpus content.
Test Coverage `cognee/tests/unit/eval_framework/benchmark_adapters_test.py`, `cognee/tests/unit/eval_framework/corpus_builder_test.py`, `cognee/tests/unit/eval_framework/logistics_ontology_test.py`	Added tests for adapter instantiation, world creation/loading, corpus building adapter preparation flow, and delivery plan validation logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Cog 1293 corpus builder custom cognify tasks #527: Modifies the corpus builder executor control flow alongside this PR's adapter preparation and deduplication changes.

Suggested labels

core-team, run-checks

Suggested reviewers

lxobr
hajdul88
borisarzentar

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely the uncompleted template with no human-written content, missing all required sections including a clear description of changes, acceptance criteria, type of change selection, and test evidence.	Provide a human-written description explaining the GTRL corpus builder feature, list acceptance criteria, select the appropriate change type, include test screenshots, and complete the pre-submission checklist.
Docstring Coverage	⚠️ Warning	Docstring coverage is 1.21% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: expanding the evaluation framework with a GTRL corpus builder, which aligns with the substantial additions of the LogisticsSystemAdapter and related components.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/cog-4596-expand-eval_framework-with-gtrl-corpus_builder

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-01T14:39:33Z

Hello @ArmagedonFlamer, thank you for submitting a PR! We will respond as soon as possible.

mergify · 2026-04-01T14:40:04Z

⚠️ The sha of the head commit of this PR conflicts with #2447. Mergify cannot evaluate rules on this PR. Once #2447 is merged or closed, Mergify will resume processing this PR. ⚠️

…trl-corpus_builder

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (6)

cognee/tests/unit/eval_framework/benchmark_adapters_test.py (1)

137-143: Avoid brittle order-coupling in answers_by_type.

Building the map from qa_pairs[:3] assumes a fixed ordering. Using all entries keeps the test robust to harmless ordering changes.

Suggested refactor

-    answers_by_type = {item["question_type"]: item for item in qa_pairs[:3]}
+    answers_by_type = {}
+    for item in qa_pairs:
+        answers_by_type.setdefault(item["question_type"], item)
+
+    assert "delivery_days" in answers_by_type
+    assert "transport_cost" in answers_by_type
+    assert "carrier" in answers_by_type

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cognee/tests/unit/eval_framework/benchmark_adapters_test.py` around lines 137
- 143, The test creates answers_by_type from qa_pairs[:3], which couples the
test to a specific ordering; update the construction of answers_by_type to
iterate over the entire qa_pairs list (qa_pairs) and build the dict by mapping
item["question_type"] to item so the assertions (on answers_by_type, and checks
for "delivery_days", "transport_cost", "carrier" and their answer/golden_*
fields) don't depend on the first three elements; ensure duplicate question_type
handling is intentional (e.g., last-wins) or assert uniqueness if needed.

cognee/eval_framework/benchmark_adapters/base_benchmark_adapter.py (1)

11-12: Add a short docstring for the new public hook.

prepare_corpus is now part of the adapter contract; a one-line docstring clarifying override expectations will make implementations more consistent.
Suggested refactor
 class BaseBenchmarkAdapter(ABC):
     async def prepare_corpus(self) -> None:
+        """Optional async hook to prepare adapter assets before load_corpus()."""
         return None
As per coding guidelines: "undocumented function definitions and class definitions in the project's Python code are assumed incomplete."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/eval_framework/benchmark_adapters/base_benchmark_adapter.py` around
lines 11 - 12, Add a one-line docstring to the new public async hook
prepare_corpus(self) -> None clarifying that this is an optional asynchronous
hook adapters should override to prepare or load corpus data and that the base
implementation is a no-op (returns None); keep it short and placed immediately
under the prepare_corpus signature in base_benchmark_adapter.py so
implementations know to override it when needed.

cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py (1)

82-89: Preserve the last world-generation failure.

Retrying every ValueError here and then replacing it with a generic incompatibility message hides the real cause when generation fails for some other reason. Catch only the expected retryable error, or at least chain the last exception into the final one.

Suggested change

     def _create_world_with_packages(self, max_attempts: int = 10) -> dict[str, object]:
+        last_error: ValueError | None = None
         for _ in range(max_attempts):
             world = create_world(
                 user_count=self.user_count,
                 retailer_count=self.retailer_count,
             )
             try:
                 return add_packages_to_world(world, package_count=self.package_count)
-            except ValueError:
+            except ValueError as error:
+                last_error = error
                 continue
 
         raise ValueError(
             "Could not generate a logistics world with compatible user and retailer pairs."
-        )
+        ) from last_error

As per coding guidelines, "Prefer explicit, structured error handling in Python code."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py` around
lines 82 - 89, The retry loop around add_packages_to_world currently swallows
all ValueError instances and raises a generic ValueError, losing the root cause;
modify the loop in logistics_system_adapter.py to either catch a more specific
retryable exception (not broad ValueError) or store the last caught exception
(e.g., last_exc) and re-raise the final ValueError chaining it with "from
last_exc" so the original error from add_packages_to_world/package_count is
preserved; update the except clause around
add_packages_to_world(package_count=self.package_count) to implement one of
these two options and ensure the final raise includes the original exception
context.

cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py (3)

92-133: Add a docstring to from_labels and consider documenting error behavior.

The classmethod lacks documentation. Additionally, calls to ShippingRange.from_label(), PackageCategory.from_label(), etc., will presumably raise exceptions if passed invalid labels. While the current usage (via _seed_package) guarantees valid inputs from package_possible_values, documenting the expected behavior and valid inputs would help prevent misuse.

📝 Suggested docstring

     `@classmethod`
     def from_labels(
         cls,
         package_id: str,
         description: str,
         weight_kg: float,
         shipping_range: str,
         category: str,
         priority: str,
         retailer_id: str | None = None,
         retailer_name: str | None = None,
         user_id: str | None = None,
         user_name: str | None = None,
         insured: bool = False,
         current_state: str = PackageState.CREATED.label,
         last_known_location: str = "origin_warehouse",
         current_post_office_id: str | None = None,
         current_post_office_name: str | None = None,
         route_post_office_ids: tuple[str, ...] = (),
         route_post_office_names: tuple[str, ...] = (),
         status_history: tuple[str, ...] = (),
     ) -> "Package":
+        """Construct a Package from human-readable label strings.
+
+        Args:
+            shipping_range: Label for ShippingRange enum (e.g., "local", "regional").
+            category: Label for PackageCategory enum.
+            priority: Label for DeliveryPriority enum.
+            current_state: Label for PackageState enum.
+            **other args: Passed through directly to Package.__init__.
+
+        Returns:
+            A new Package instance with enums resolved from labels.
+
+        Raises:
+            ValueError: If any label does not match a valid enum member.
+        """
         return cls(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py`
around lines 92 - 133, Add a docstring to the Package.from_labels classmethod
that describes its purpose, parameters, return type, and expected input labels;
explicitly list or reference the valid label sources (e.g.,
package_possible_values or the enums) and document that invalid labels passed to
ShippingRange.from_label, PackageCategory.from_label,
DeliveryPriority.from_label, or PackageState.from_label will raise exceptions
(propagated to the caller) so callers know error behavior. Reference the method
name from_labels and the helper conversion functions ShippingRange.from_label,
PackageCategory.from_label, DeliveryPriority.from_label, and
PackageState.from_label in the docstring.

13-30: Consider adding a docstring or inline comment for the constant.

The package_possible_values dictionary serves as a configuration for valid attribute values, but its purpose and usage context aren't immediately clear. A brief docstring or comment would help future maintainers understand this is used for random sampling during package generation.

As per coding guidelines, undocumented definitions are assumed incomplete.

📝 Suggested documentation

+# Valid attribute values for random package generation in logistics simulations.
+# Keys map to Package field names; values are sampled via random.choice().
 package_possible_values = {
     "weight_kg": [1.0, 5.0, 25.0, 150.0, 900.0],

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py`
around lines 13 - 30, Add a concise docstring or inline comment above the
package_possible_values constant explaining its purpose and usage (e.g., that
package_possible_values is a configuration mapping used for random
sampling/validation when generating Package entities in the logistics system),
describe what each key represents (weight_kg, shipping_range, category,
priority, insured, current_state, last_known_location), and note that
current_state excludes DELIVERED and RETURNED; place this comment immediately
above the package_possible_values definition in package.py so maintainers see it
when editing the Package generation logic.

33-52: Add a class-level docstring for Package.

Per the coding guidelines, undocumented class definitions are assumed incomplete. A docstring describing the purpose, key fields, and usage of this domain entity would improve maintainability and assist developers unfamiliar with the logistics simulation module.

📝 Suggested docstring

 `@dataclass`(slots=True)
 class Package:
+    """Represents a package in the logistics simulation.
+
+    Attributes:
+        package_id: Unique identifier for the package.
+        description: Human-readable description of package contents.
+        weight_kg: Package weight in kilograms.
+        shipping_range: Geographic scope of delivery (local, regional, etc.).
+        category: Classification affecting handling requirements.
+        priority: Delivery urgency level.
+        current_state: Current status in the delivery lifecycle.
+        route_post_office_ids: Ordered sequence of post offices in the delivery route.
+    """
     package_id: str

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py`
around lines 33 - 52, Add a clear class-level docstring to the Package dataclass
describing its purpose in the logistics simulation, the key fields (e.g.,
package_id, description, weight_kg, shipping_range, category, priority,
current_state, last_known_location, route_post_office_ids, status_history, and
optional retailer/user info), typical usage (how instances are created/updated
and how state/location fields change), and any important invariants (e.g.,
route_post_office_ids order corresponds to route_post_office_names). Place this
docstring immediately below the `@dataclass`(slots=True) class Package:
declaration so it documents the Package class for maintainers and tooling.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py`:
- Around line 184-186: The adapter currently builds question_text from
package_context and question_prompt but ignores the stored custom query
(self.query) because the append logic is commented out; re-enable and guard that
append so when self.query is set and non-empty it gets appended (e.g., in the
method that constructs question_text, use a conditional to append
f"{self.query.strip()}" to question_text with a newline). Update the logic
around question_text, package_context, question_prompt and self.query so passing
query= in __init__ has effect and preserve spacing/newlines when appending.
- Around line 133-134: The _load_shared_corpus_documents method currently only
returns the contents of DEFAULT_COMMON_RULES_PATH, so the new policy rules file
never gets added to the corpus; update _load_shared_corpus_documents to also
read and include the policy rules (e.g.,
DEFAULT_POLICY_RULES_PATH.read_text(encoding="utf-8").strip()) alongside
DEFAULT_COMMON_RULES_PATH so both rule texts are returned (as a list of strings)
and become retrievable context.

In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.py`:
- Around line 230-239: The sync wrapper inverts the running-loop check and
attempts to create a nested loop; instead, detect a running loop with
asyncio.get_running_loop() and, if a loop exists, run narrativize_corpus in a
separate thread (e.g., use
concurrent.futures.ThreadPoolExecutor().submit(lambda:
asyncio.run(narrativize_corpus(...))).result()) so you avoid nesting event loops
in the same thread; if no running loop (get_running_loop() raises), simply call
asyncio.run(narrativize_corpus(...)). Ensure you reference the
narrativize_corpus coroutine and replace the current
new_event_loop/run_until_complete branch with the thread-based executor approach
when a loop is present.

In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/ontology.py`:
- Around line 46-83: The transport network is being built once from the initial
_seed_user/_seed_retailer pair and reused, so later retailers get
origin_post_office_id values that don't match actual origin facilities and
_route_post_offices_for_state() never sees those origins; fix by seeding post
offices and carriers per retailer/user pair instead of once: call
_seed_post_offices(...) and set retailer.origin_post_office_id via
_origin_post_office_id_for_retailer(...) for each retailer (or move the existing
post_offices assignment inside the retailers/users loop), and create carriers
with _seed_world_carriers(...) per retailer/user so each retailer has a matching
carrier and origin post office used by _route_post_offices_for_state().

In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/queries/query_data_sources.txt`:
- Around line 13-21: The JSON schema uses non-JSON type names and inconsistent
key naming: replace the non-JSON tokens "int" and "float" for keys like
estimated_delivery_days and estimated_transport_price with the valid JSON type
"number", and normalize the route supporting-facts data-source key by renaming
route_supporting_facts_data_source to the plural
route_supporting_facts_data_sources so it matches the pattern used by
estimated_*_supporting_facts_data_sources.

In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/rule_engine.py`:
- Around line 198-211: compute_outcome currently picks the first retailer/user
from world instead of the package owner, causing wrong scoring for non-zero
index packages; after calling _resolve_package (which returns the package dict
with retailer_id and user_id), resolve the retailer and user using those ids
(e.g., pass package["retailer_id"] and package["user_id"] into
_resolve_world_actor or a lookup helper) and then call evaluate with the
resolved retailer and user so scoring uses the package's actual retailer/user;
update compute_outcome to use package-sourced ids rather than hardcoded
"retailers"/"users" defaults.

In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/rules/rules_policy_language.txt`:
- Line 10: Rule R10 requires region_distance = abs(origin.score -
destination.score) + 1 with +2 if crossing US/non‑US, but _region_distance()
currently returns 0 for same-region routes; update _region_distance in
cognee/eval_framework/benchmark_adapters/logistics_system_utils/rule_engine.py
to compute distance as abs(origin_region_score - destination_region_score) + 1
unconditionally and then add the +2 cross-border penalty when origin and
destination are US vs non‑US, and adjust any unit tests or callers that assumed
a zero same-region result.

In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/utils.py`:
- Around line 341-349: load_world currently omits the "packages" key when the
packages list is empty, breaking round-tripping with store_world (which always
writes "packages") and causing callers like write_golden_answers to fail; update
the load_world logic (the world dict construction in load_world) to always set
world["packages"] = packages regardless of emptiness so the key is preserved
even for an empty list.
- Around line 271-275: The helper currently falls back to the heuristic matcher
when package.user_id and package.retailer_id are present but one or both lookups
via _find_by_id(users, "user_id", package.user_id) or _find_by_id(retailers,
"retailer_id", package.retailer_id) fail; change this so that if package.user_id
and package.retailer_id are not None and either matched_user or matched_retailer
is None the function does not continue to the heuristic matcher but instead
returns a clear failure (e.g., raise a ValueError or return None/a specific
error sentinel) so corrupted IDs surface immediately rather than producing wrong
golden answers. Ensure the check lives alongside the existing
matched_user/matched_retailer logic and that callers handle the raised
error/None result.

In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py`:
- Line 170: The generated IDs using random.randint(...) (e.g., in create_world()
and places that set post_office_id, carrier IDs, etc., and that feed into
_entity_entries() and persisted filenames) are collision-prone; replace them
with collision-safe IDs by using uuid.uuid4().hex (or another UUID generator)
instead of random.randint, or implement a uniqueness loop that checks the
destination dict/sets (e.g., the map used by _entity_entries()) to ensure the
new ID is not already present before assigning; update all occurrences
referenced in create_world(), post_office_id, carrier ID generation sites, and
any other similar ID generators (the lines called out in the review) so
filenames and keys remain unique.
- Around line 661-678: _seed_package() currently assigns current_post_office to
the last routed office unconditionally, which causes IN_TRANSIT,
OUT_FOR_DELIVERY, and DELIVERED to show a sorting center instead of the special
labels from _location_for_state(); change the logic so that before building the
Package you set current_post_office = None when package_state is one of the
transit/delivery/completed states (e.g., package_state in
{PackageState.IN_TRANSIT, PackageState.OUT_FOR_DELIVERY,
PackageState.DELIVERED}) so that last_known_location and
current_post_office_name are computed from _location_for_state(package_state,
current_post_office) and do not point at a post office.

---

Nitpick comments:
In `@cognee/eval_framework/benchmark_adapters/base_benchmark_adapter.py`:
- Around line 11-12: Add a one-line docstring to the new public async hook
prepare_corpus(self) -> None clarifying that this is an optional asynchronous
hook adapters should override to prepare or load corpus data and that the base
implementation is a no-op (returns None); keep it short and placed immediately
under the prepare_corpus signature in base_benchmark_adapter.py so
implementations know to override it when needed.

In `@cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py`:
- Around line 82-89: The retry loop around add_packages_to_world currently
swallows all ValueError instances and raises a generic ValueError, losing the
root cause; modify the loop in logistics_system_adapter.py to either catch a
more specific retryable exception (not broad ValueError) or store the last
caught exception (e.g., last_exc) and re-raise the final ValueError chaining it
with "from last_exc" so the original error from
add_packages_to_world/package_count is preserved; update the except clause
around add_packages_to_world(package_count=self.package_count) to implement one
of these two options and ensure the final raise includes the original exception
context.

In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py`:
- Around line 92-133: Add a docstring to the Package.from_labels classmethod
that describes its purpose, parameters, return type, and expected input labels;
explicitly list or reference the valid label sources (e.g.,
package_possible_values or the enums) and document that invalid labels passed to
ShippingRange.from_label, PackageCategory.from_label,
DeliveryPriority.from_label, or PackageState.from_label will raise exceptions
(propagated to the caller) so callers know error behavior. Reference the method
name from_labels and the helper conversion functions ShippingRange.from_label,
PackageCategory.from_label, DeliveryPriority.from_label, and
PackageState.from_label in the docstring.
- Around line 13-30: Add a concise docstring or inline comment above the
package_possible_values constant explaining its purpose and usage (e.g., that
package_possible_values is a configuration mapping used for random
sampling/validation when generating Package entities in the logistics system),
describe what each key represents (weight_kg, shipping_range, category,
priority, insured, current_state, last_known_location), and note that
current_state excludes DELIVERED and RETURNED; place this comment immediately
above the package_possible_values definition in package.py so maintainers see it
when editing the Package generation logic.
- Around line 33-52: Add a clear class-level docstring to the Package dataclass
describing its purpose in the logistics simulation, the key fields (e.g.,
package_id, description, weight_kg, shipping_range, category, priority,
current_state, last_known_location, route_post_office_ids, status_history, and
optional retailer/user info), typical usage (how instances are created/updated
and how state/location fields change), and any important invariants (e.g.,
route_post_office_ids order corresponds to route_post_office_names). Place this
docstring immediately below the `@dataclass`(slots=True) class Package:
declaration so it documents the Package class for maintainers and tooling.

In `@cognee/tests/unit/eval_framework/benchmark_adapters_test.py`:
- Around line 137-143: The test creates answers_by_type from qa_pairs[:3], which
couples the test to a specific ordering; update the construction of
answers_by_type to iterate over the entire qa_pairs list (qa_pairs) and build
the dict by mapping item["question_type"] to item so the assertions (on
answers_by_type, and checks for "delivery_days", "transport_cost", "carrier" and
their answer/golden_* fields) don't depend on the first three elements; ensure
duplicate question_type handling is intentional (e.g., last-wins) or assert
uniqueness if needed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 99606914-d2f5-4b4f-b156-7abfcef484c0

📥 Commits

Reviewing files that changed from the base of the PR and between 413a0f9 and e9ef7a2.

📒 Files selected for processing (25)

cognee/eval_framework/benchmark_adapters/base_benchmark_adapter.py
cognee/eval_framework/benchmark_adapters/benchmark_adapters.py
cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/delivery_models.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/enums.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/post_office.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/retailer.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/transportation.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/user.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/ontology.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/prompts/narrativization.txt
cognee/eval_framework/benchmark_adapters/logistics_system_utils/prompts/package_narrativization.txt
cognee/eval_framework/benchmark_adapters/logistics_system_utils/queries/query.txt
cognee/eval_framework/benchmark_adapters/logistics_system_utils/queries/query_data_sources.txt
cognee/eval_framework/benchmark_adapters/logistics_system_utils/rule_engine.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/rules/rules_common_language.txt
cognee/eval_framework/benchmark_adapters/logistics_system_utils/rules/rules_policy_language.txt
cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/utils.py
cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py
cognee/eval_framework/corpus_builder/corpus_builder_executor.py
cognee/tests/unit/eval_framework/benchmark_adapters_test.py
cognee/tests/unit/eval_framework/corpus_builder_test.py
cognee/tests/unit/eval_framework/logistics_ontology_test.py

coderabbitai · 2026-04-02T15:38:28Z

cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py

+    def _load_shared_corpus_documents(self) -> list[str]:
+        return [DEFAULT_COMMON_RULES_PATH.read_text(encoding="utf-8").strip()]


⚠️ Potential issue | 🟠 Major

Load the new policy rules into the corpus.

load_corpus() only injects rules_common_language.txt, so the new rules_policy_language.txt never becomes retrievable context. Right now the added policy spec is dead data, and the corpus can drift away from the hard-coded evaluation logic.

Suggested change

DEFAULT_COMMON_RULES_PATH = ( Path(__file__).resolve().parent / "logistics_system_utils" / "rules" / "rules_common_language.txt" ) +DEFAULT_POLICY_RULES_PATH = ( + Path(__file__).resolve().parent + / "logistics_system_utils" + / "rules" + / "rules_policy_language.txt" +) ... def _load_shared_corpus_documents(self) -> list[str]: - return [DEFAULT_COMMON_RULES_PATH.read_text(encoding="utf-8").strip()] + return [ + DEFAULT_COMMON_RULES_PATH.read_text(encoding="utf-8").strip(), + DEFAULT_POLICY_RULES_PATH.read_text(encoding="utf-8").strip(), + ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py` around lines 133 - 134, The _load_shared_corpus_documents method currently only returns the contents of DEFAULT_COMMON_RULES_PATH, so the new policy rules file never gets added to the corpus; update _load_shared_corpus_documents to also read and include the policy rules (e.g., DEFAULT_POLICY_RULES_PATH.read_text(encoding="utf-8").strip()) alongside DEFAULT_COMMON_RULES_PATH so both rule texts are returned (as a list of strings) and become retrievable context.

coderabbitai · 2026-04-02T15:38:28Z

cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py

+                question_text = f"{package_context}\n\n{question_prompt}"
+                # if self.query.strip():
+                #    question_text = f"{question_text}\n{self.query.strip()}"


⚠️ Potential issue | 🟠 Major

query is ignored right now.

__init__() still accepts and stores a custom query template, but the only place it was appended to the question text is commented out. Passing query= currently has no effect.

Suggested change

question_text = f"{package_context}\n\n{question_prompt}" - # if self.query.strip(): - # question_text = f"{question_text}\n{self.query.strip()}" + if self.query.strip(): + question_text = f"{question_text}\n{self.query.strip()}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

question_text = f"{package_context}\n\n{question_prompt}"

# if self.query.strip():

# question_text = f"{question_text}\n{self.query.strip()}"

question_text = f"{package_context}\n\n{question_prompt}"

if self.query.strip():

question_text = f"{question_text}\n{self.query.strip()}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py` around lines 184 - 186, The adapter currently builds question_text from package_context and question_prompt but ignores the stored custom query (self.query) because the append logic is commented out; re-enable and guard that append so when self.query is set and non-empty it gets appended (e.g., in the method that constructs question_text, use a conditional to append f"{self.query.strip()}" to question_text with a newline). Update the logic around question_text, package_context, question_prompt and self.query so passing query= in __init__ has effect and preserve spacing/newlines when appending.

coderabbitai · 2026-04-02T15:38:28Z

...l_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.py

+    try:
+        asyncio.get_running_loop()
+    except RuntimeError:
+        return asyncio.run(narrativize_corpus(world_root_path, use_llm=use_llm))
+
+    loop = asyncio.new_event_loop()
+    try:
+        return loop.run_until_complete(narrativize_corpus(world_root_path, use_llm=use_llm))
+    finally:
+        loop.close()


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Expected result: # RuntimeError: Cannot run the event loop while another loop is running python3 - <<'PY' import asyncio async def noop(): return None async def probe(): loop = asyncio.new_event_loop() coro = noop() try: try: loop.run_until_complete(coro) except Exception as exc: print(type(exc).__name__, exc) finally: coro.close() loop.close() asyncio.run(probe()) PY

Repository: topoteretes/cognee

Length of output: 130

🏁 Script executed:

# First, verify the file exists and check the exact code at lines 230-239 fd -t f narrativize_corpus.py

Repository: topoteretes/cognee

Length of output: 164

🏁 Script executed:

# Read the file to see lines around 230-239 with context wc -l cognee/eval_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.py

Repository: topoteretes/cognee

Length of output: 168

🏁 Script executed:

# Read the specific section with context sed -n '200,245p' cognee/eval_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.py

Repository: topoteretes/cognee

Length of output: 1301

This sync wrapper fails when called from an already-running event loop.

The logic is inverted: when asyncio.get_running_loop() succeeds (indicating a loop already exists), the code attempts to create a second loop and call run_until_complete() on it. This raises RuntimeError: Cannot run the event loop while another loop is running. In practice, this breaks the sync wrapper in notebooks, pytest-asyncio, and ASGI contexts where it should be most useful.

The correct pattern requires using a different strategy (e.g., asyncio.to_thread() for Python 3.9+, or nest_asyncio as a workaround) rather than attempting to nest event loops in the same thread.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.py` around lines 230 - 239, The sync wrapper inverts the running-loop check and attempts to create a nested loop; instead, detect a running loop with asyncio.get_running_loop() and, if a loop exists, run narrativize_corpus in a separate thread (e.g., use concurrent.futures.ThreadPoolExecutor().submit(lambda: asyncio.run(narrativize_corpus(...))).result()) so you avoid nesting event loops in the same thread; if no running loop (get_running_loop() raises), simply call asyncio.run(narrativize_corpus(...)). Ensure you reference the narrativize_corpus coroutine and replace the current new_event_loop/run_until_complete branch with the thread-based executor approach when a loop is present.

coderabbitai · 2026-04-02T15:38:28Z

cognee/eval_framework/benchmark_adapters/logistics_system_utils/ontology.py

+    user = _seed_user(name=_pick_unique_name(USER_NAMES, used_user_names, "User"))
+    retailer = _seed_retailer(
+        shipping_range=user.default_shipping_range.label,
+        name=_pick_unique_name(RETAILER_NAMES, used_retailer_names, "Retailer"),
+    )
+    post_offices = _seed_post_offices(user.default_shipping_range, retailer.region, user.region)
+    retailer.origin_post_office_id = _origin_post_office_id_for_retailer(
+        retailer.region, post_offices
+    )
+
+    users = [user]
+    for _ in range(user_count - 1):
+        users.append(
+            _seed_user(
+                name=_pick_unique_name(USER_NAMES, used_user_names, "User"),
+            )
+        )
+
+    retailers = [retailer]
+    for _ in range(retailer_count - 1):
+        retailers.append(
+            _seed_retailer(
+                shipping_range=random.choice(retailer_possible_values["shipping_range"]),
+                origin_post_office_id=None,
+                name=_pick_unique_name(RETAILER_NAMES, used_retailer_names, "Retailer"),
+            )
+        )
+
+    for retailer in retailers:
+        retailer.origin_post_office_id = _origin_post_office_id_for_retailer(
+            retailer.region, post_offices
+        )
+
+    carriers = _seed_world_carriers(
+        retailer=retailer,
+        user=user,
+        shipping_range=user.default_shipping_range,
+    )


⚠️ Potential issue | 🟠 Major

Don't build the whole transport network around the first seeded pair.

post_offices and carriers are seeded once from the initial user/retailer pair and then reused for every later retailer/user. That leaves many retailers with origin_post_office_ids that point to non-origin facilities, and cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py::_route_post_offices_for_state() only prefers warehouse IDs, so those configured origins are never reflected in the generated routes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/ontology.py` around lines 46 - 83, The transport network is being built once from the initial _seed_user/_seed_retailer pair and reused, so later retailers get origin_post_office_id values that don't match actual origin facilities and _route_post_offices_for_state() never sees those origins; fix by seeding post offices and carriers per retailer/user pair instead of once: call _seed_post_offices(...) and set retailer.origin_post_office_id via _origin_post_office_id_for_retailer(...) for each retailer (or move the existing post_offices assignment inside the retailers/users loop), and create carriers with _seed_world_carriers(...) per retailer/user so each retailer has a matching carrier and origin post office used by _route_post_offices_for_state().

coderabbitai · 2026-04-02T15:38:28Z

cognee/eval_framework/benchmark_adapters/logistics_system_utils/queries/query_data_sources.txt

+  "estimated_delivery_days": int,
+  "estimated_delivery_days_supporting_facts": ["string"],
+  "estimated_delivery_days_supporting_facts_data_sources": ["string"],
+  "estimated_transport_price": float,
+  "estimated_transport_price_supporting_facts": ["string"],
+  "estimated_transport_price_supporting_facts_data_sources": ["string"],
+  "route": "string",
+  "route_supporting_facts": ["string"],
+  "route_supporting_facts_data_source": ["string"]


⚠️ Potential issue | 🟠 Major

JSON example can induce invalid output and a key mismatch.

The schema currently uses int/float (not valid JSON values) and mixes singular/plural data-source key naming for route facts.

Proposed fix

- "estimated_delivery_days": int, + "estimated_delivery_days": 0, "estimated_delivery_days_supporting_facts": ["string"], "estimated_delivery_days_supporting_facts_data_sources": ["string"], - "estimated_transport_price": float, + "estimated_transport_price": 0.0, "estimated_transport_price_supporting_facts": ["string"], "estimated_transport_price_supporting_facts_data_sources": ["string"], "route": "string", "route_supporting_facts": ["string"], - "route_supporting_facts_data_source": ["string"] + "route_supporting_facts_data_sources": ["string"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"estimated_delivery_days": int,

"estimated_delivery_days_supporting_facts": ["string"],

"estimated_delivery_days_supporting_facts_data_sources": ["string"],

"estimated_transport_price": float,

"estimated_transport_price_supporting_facts": ["string"],

"estimated_transport_price_supporting_facts_data_sources": ["string"],

"route": "string",

"route_supporting_facts": ["string"],

"route_supporting_facts_data_source": ["string"]

"estimated_delivery_days": 0,

"estimated_delivery_days_supporting_facts": ["string"],

"estimated_delivery_days_supporting_facts_data_sources": ["string"],

"estimated_transport_price": 0.0,

"estimated_transport_price_supporting_facts": ["string"],

"estimated_transport_price_supporting_facts_data_sources": ["string"],

"route": "string",

"route_supporting_facts": ["string"],

"route_supporting_facts_data_sources": ["string"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/queries/query_data_sources.txt` around lines 13 - 21, The JSON schema uses non-JSON type names and inconsistent key naming: replace the non-JSON tokens "int" and "float" for keys like estimated_delivery_days and estimated_transport_price with the valid JSON type "number", and normalize the route supporting-facts data-source key by renaming route_supporting_facts_data_source to the plural route_supporting_facts_data_sources so it matches the pattern used by estimated_*_supporting_facts_data_sources.

coderabbitai · 2026-04-02T15:38:28Z

cognee/eval_framework/benchmark_adapters/logistics_system_utils/rules/rules_policy_language.txt

+R07. An express package shall not be assigned to the shipment transport mode.
+R08. A package already delivered to the recipient shall not receive a new transport assignment.
+R09. Retailer and user regions shall contribute to both delivery-time estimation and transport-price estimation.
+R10. Region distance shall be computed as the absolute difference between origin and destination region distance scores, plus 1, with an additional penalty of 2 when crossing between US and non-US regions.


⚠️ Potential issue | 🟠 Major

R10 disagrees with the engine on same-region routes.

_region_distance() in cognee/eval_framework/benchmark_adapters/logistics_system_utils/rule_engine.py returns 0 when origin and destination are the same region, but this rule says abs(...) + 1 unconditionally. That makes the published rule add 1 day and $3 to same-region shipments that the generated golden answers will not.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/rules/rules_policy_language.txt` at line 10, Rule R10 requires region_distance = abs(origin.score - destination.score) + 1 with +2 if crossing US/non‑US, but _region_distance() currently returns 0 for same-region routes; update _region_distance in cognee/eval_framework/benchmark_adapters/logistics_system_utils/rule_engine.py to compute distance as abs(origin_region_score - destination_region_score) + 1 unconditionally and then add the +2 cross-border penalty when origin and destination are US vs non‑US, and adjust any unit tests or callers that assumed a zero same-region result.

coderabbitai · 2026-04-02T15:38:28Z

cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/utils.py

+    if package.user_id is not None and package.retailer_id is not None:
+        matched_user = _find_by_id(users, "user_id", package.user_id)
+        matched_retailer = _find_by_id(retailers, "retailer_id", package.retailer_id)
+        if matched_user is not None and matched_retailer is not None:
+            return matched_user, matched_retailer


⚠️ Potential issue | 🟠 Major

Don't silently remap packages with broken IDs.

Once a package carries explicit user_id and retailer_id, this helper should fail if either lookup misses. Falling back to the heuristic matcher here will generate golden answers for the wrong entities instead of surfacing corrupted world data.

🔧 Suggested fix

if package.user_id is not None and package.retailer_id is not None: matched_user = _find_by_id(users, "user_id", package.user_id) matched_retailer = _find_by_id(retailers, "retailer_id", package.retailer_id) - if matched_user is not None and matched_retailer is not None: - return matched_user, matched_retailer + if matched_user is None or matched_retailer is None: + raise ValueError( + f"Package {package.package_id} references an unknown user or retailer." + ) + return matched_user, matched_retailer

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if package.user_id is not None and package.retailer_id is not None:

matched_user = _find_by_id(users, "user_id", package.user_id)

matched_retailer = _find_by_id(retailers, "retailer_id", package.retailer_id)

if matched_user is not None and matched_retailer is not None:

return matched_user, matched_retailer

if package.user_id is not None and package.retailer_id is not None:

matched_user = _find_by_id(users, "user_id", package.user_id)

matched_retailer = _find_by_id(retailers, "retailer_id", package.retailer_id)

if matched_user is None or matched_retailer is None:

raise ValueError(

f"Package {package.package_id} references an unknown user or retailer."

)

return matched_user, matched_retailer

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/utils.py` around lines 271 - 275, The helper currently falls back to the heuristic matcher when package.user_id and package.retailer_id are present but one or both lookups via _find_by_id(users, "user_id", package.user_id) or _find_by_id(retailers, "retailer_id", package.retailer_id) fail; change this so that if package.user_id and package.retailer_id are not None and either matched_user or matched_retailer is None the function does not continue to the heuristic matcher but instead returns a clear failure (e.g., raise a ValueError or return None/a specific error sentinel) so corrupted IDs surface immediately rather than producing wrong golden answers. Ensure the check lives alongside the existing matched_user/matched_retailer logic and that callers handle the raised error/None result.

coderabbitai · 2026-04-02T15:38:28Z

cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/utils.py

+    world: dict[str, object] = {
+        "retailers": retailers,
+        "users": users,
+        "post_offices": post_offices,
+        "carriers": carriers,
+    }
+
+    if packages:
+        world["packages"] = packages


⚠️ Potential issue | 🟠 Major

Always preserve packages when loading the world.

store_world() always writes a packages array, but load_world() drops that key when the array is empty. That breaks round-tripping and makes downstream code such as write_golden_answers() crash on valid empty worlds.

🔧 Suggested fix

world: dict[str, object] = { "retailers": retailers, "users": users, + "packages": packages, "post_offices": post_offices, "carriers": carriers, } - - if packages: - world["packages"] = packages return world

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

world: dict[str, object] = {

"retailers": retailers,

"users": users,

"post_offices": post_offices,

"carriers": carriers,

}

if packages:

world["packages"] = packages

world: dict[str, object] = {

"retailers": retailers,

"users": users,

"packages": packages,

"post_offices": post_offices,

"carriers": carriers,

}

return world

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/utils.py` around lines 341 - 349, load_world currently omits the "packages" key when the packages list is empty, breaking round-tripping with store_world (which always writes "packages") and causing callers like write_golden_answers to fail; update the load_world logic (the world dict construction in load_world) to always set world["packages"] = packages regardless of emptiness so the key is preserved even for an empty list.

coderabbitai · 2026-04-02T15:38:28Z

cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py

+    supports_hazardous_materials: bool,
+) -> PostOffice:
+    return PostOffice(
+        post_office_id=f"po-{random.randint(1000, 9999)}",


⚠️ Potential issue | 🟠 Major

Make generated IDs collision-safe.

All generated IDs come from unchecked random.randint(...). Carriers are the worst case here: create_world() seeds 22 of them from a 900-value suffix space, so duplicate IDs are realistic. A collision will overwrite _entity_entries() keys, reuse filenames, and make references ambiguous across the persisted world.

Also applies to: 540-540, 597-597, 626-626, 664-664

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py` at line 170, The generated IDs using random.randint(...) (e.g., in create_world() and places that set post_office_id, carrier IDs, etc., and that feed into _entity_entries() and persisted filenames) are collision-prone; replace them with collision-safe IDs by using uuid.uuid4().hex (or another UUID generator) instead of random.randint, or implement a uniqueness loop that checks the destination dict/sets (e.g., the map used by _entity_entries()) to ensure the new ID is not already present before assigning; update all occurrences referenced in create_world(), post_office_id, carrier ID generation sites, and any other similar ID generators (the lines called out in the review) so filenames and keys remain unique.

coderabbitai · 2026-04-02T15:38:29Z

cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py

+    current_post_office = visited_post_offices[-1] if visited_post_offices else None
+
+    return Package.from_labels(
+        package_id=f"pkg-{random.randint(10000, 99999)}",
+        description=description or random.choice(PACKAGE_DESCRIPTIONS),
+        weight_kg=random.choice(package_possible_values["weight_kg"]),
+        shipping_range=shipping_range,
+        category=category,
+        priority=priority,
+        retailer_id=retailer_id,
+        retailer_name=retailer_name,
+        user_id=user_id,
+        user_name=user_name,
+        insured=random.choice(package_possible_values["insured"]),
+        current_state=current_state,
+        last_known_location=_location_for_state(package_state, current_post_office),
+        current_post_office_id=current_post_office.post_office_id if current_post_office else None,
+        current_post_office_name=current_post_office.name if current_post_office else None,


⚠️ Potential issue | 🟠 Major

Delivered and in-flight states shouldn't point at a post office.

_seed_package() always sets current_post_office to the last routed office. For IN_TRANSIT, OUT_FOR_DELIVERY, and DELIVERED, that makes last_known_location/current_post_office_name report a sorting center, and _location_for_state()'s special labels never get used.

🔧 Suggested fix

current_post_office = visited_post_offices[-1] if visited_post_offices else None + if package_state in { + PackageState.IN_TRANSIT, + PackageState.OUT_FOR_DELIVERY, + PackageState.DELIVERED, + }: + current_post_office = None return Package.from_labels(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py` around lines 661 - 678, _seed_package() currently assigns current_post_office to the last routed office unconditionally, which causes IN_TRANSIT, OUT_FOR_DELIVERY, and DELIVERED to show a sorting center instead of the special labels from _location_for_state(); change the logic so that before building the Package you set current_post_office = None when package_state is one of the transit/delivery/completed states (e.g., package_state in {PackageState.IN_TRANSIT, PackageState.OUT_FOR_DELIVERY, PackageState.DELIVERED}) so that last_known_location and current_post_office_name are computed from _location_for_state(package_state, current_post_office) and do not point at a post office.

ArmagedonFlamer added 30 commits March 17, 2026 15:39

initial commit

2977616

Merge branch 'dev' into feature/cog-4367-engineering-implement-import…

87e9237

…ance-weighted-ingestion

make importance_weight optional parameter in Document

3063d9a

make importance_weight optional in TextSummary

b9255ac

keep importance_weight when its 0.0 instead of defaulting to 0.5

c09ee97

Merge branch 'dev' into feature/cog-4367-engineering-implement-import…

011b618

…ance-weighted-ingestion

initial commit

146ac9c

ruff format

ae188db

Merge branch 'dev' into feature/cog-4367-engineering-implement-import…

984d5f6

…ance-weighted-ingestion

formatting

1889add

surround weighted distance calculation with try except

d4978d3

Merge branch 'dev' into feature/cog-4367-engineering-implement-import…

9d30286

…ance-weighted-ingestion

Merge branch 'dev' into feature/cog-4367-engineering-implement-import…

3c0522d

…ance-weighted-ingestion

apply coderabbit's comments

ef95ea0

Merge branch 'dev' into feature/cog-4370-engineering-update-retriever…

04957da

…s-to-use-importance-weighted

change which example is being used

ad5857a

restore consolidate_entity_descriptions_example.py

4c2a3b6

Merge branch 'feature/cog-4367-engineering-implement-importance-weigh…

5fb0056

…ted-ingestion' into feature/cog-4370-engineering-update-retrievers-to-use-importance-weighted

use graph_visualization example instead of consolidate_entity_descrip…

719d348

…tions example

update DocumentChunk to use optional importance_weight

da7655b

Merge branch 'feature/cog-4367-engineering-implement-importance-weigh…

4e1bfce

…ted-ingestion' into feature/cog-4370-engineering-update-retrievers-to-use-importance-weighted

separate distance calculation from importance_weight

16469d6

add guards in CogneeGraph in case importance_weight is None

e12bc4b

add todo message for importance_weight calculation

367a2e0

add importance_weight guide example

02e41fe

Merge branch 'dev' into feature/cog-4367-engineering-implement-import…

676c397

…ance-weighted-ingestion

Merge branch 'dev' into feature/cog-4370-engineering-update-retriever…

6aa2934

…s-to-use-importance-weighted

add tests for importance_weight

a7c1895

Merge branch 'feature/cog-4367-engineering-implement-importance-weigh…

a6659d4

…ted-ingestion' into feature/cog-4370-engineering-update-retrievers-to-use-importance-weighted

Merge branch 'dev' into feature/cog-4370-engineering-update-retriever…

55811d7

…s-to-use-importance-weighted

initial commit - adding logistics_system adapter to evaluation framework

0a5f408

ArmagedonFlamer self-assigned this Apr 1, 2026

ArmagedonFlamer added 4 commits April 2, 2026 16:58

revert changes to guides

a0188e4

revert changes to add.py

ce92aed

revert importance weight changes

de63392

revert importance weight tests changes

d1822a5

ArmagedonFlamer marked this pull request as ready for review April 2, 2026 15:16

ArmagedonFlamer requested a review from lxobr April 2, 2026 15:17

ArmagedonFlamer and others added 2 commits April 2, 2026 17:17

update tests and make sure they run

0abd398

Merge branch 'dev' into feature/cog-4596-expand-eval_framework-with-g…

e9ef7a2

…trl-corpus_builder

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

		def _load_shared_corpus_documents(self) -> list[str]:
		return [DEFAULT_COMMON_RULES_PATH.read_text(encoding="utf-8").strip()]

Conversation

ArmagedonFlamer commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Acceptance Criteria

Type of Change

Screenshots

Pre-submission Checklist

DCO Affirmation

Summary by CodeRabbit

Release Notes

Uh oh!

pull-checklist bot commented Apr 1, 2026

Please make sure all the checkboxes are checked:

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

mergify bot commented Apr 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ArmagedonFlamer commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading