Feature/cog 4596 expand eval framework with gtrl corpus builder#2534
Feature/cog 4596 expand eval framework with gtrl corpus builder#2534ArmagedonFlamer wants to merge 37 commits intodevfrom
Conversation
…ance-weighted-ingestion
…ance-weighted-ingestion
…ance-weighted-ingestion
…ance-weighted-ingestion
…ance-weighted-ingestion
…s-to-use-importance-weighted
…ted-ingestion' into feature/cog-4370-engineering-update-retrievers-to-use-importance-weighted
…ted-ingestion' into feature/cog-4370-engineering-update-retrievers-to-use-importance-weighted
…ance-weighted-ingestion
…s-to-use-importance-weighted
…ted-ingestion' into feature/cog-4370-engineering-update-retrievers-to-use-importance-weighted
…s-to-use-importance-weighted
Please make sure all the checkboxes are checked:
|
WalkthroughThis pull request introduces a comprehensive logistics simulation benchmark adapter for the evaluation framework. It adds a Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hello @ArmagedonFlamer, thank you for submitting a PR! We will respond as soon as possible. |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (6)
cognee/tests/unit/eval_framework/benchmark_adapters_test.py (1)
137-143: Avoid brittle order-coupling inanswers_by_type.Building the map from
qa_pairs[:3]assumes a fixed ordering. Using all entries keeps the test robust to harmless ordering changes.Suggested refactor
- answers_by_type = {item["question_type"]: item for item in qa_pairs[:3]} + answers_by_type = {} + for item in qa_pairs: + answers_by_type.setdefault(item["question_type"], item) + + assert "delivery_days" in answers_by_type + assert "transport_cost" in answers_by_type + assert "carrier" in answers_by_type🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cognee/tests/unit/eval_framework/benchmark_adapters_test.py` around lines 137 - 143, The test creates answers_by_type from qa_pairs[:3], which couples the test to a specific ordering; update the construction of answers_by_type to iterate over the entire qa_pairs list (qa_pairs) and build the dict by mapping item["question_type"] to item so the assertions (on answers_by_type, and checks for "delivery_days", "transport_cost", "carrier" and their answer/golden_* fields) don't depend on the first three elements; ensure duplicate question_type handling is intentional (e.g., last-wins) or assert uniqueness if needed.cognee/eval_framework/benchmark_adapters/base_benchmark_adapter.py (1)
11-12: Add a short docstring for the new public hook.
prepare_corpusis now part of the adapter contract; a one-line docstring clarifying override expectations will make implementations more consistent.Suggested refactor
class BaseBenchmarkAdapter(ABC): async def prepare_corpus(self) -> None: + """Optional async hook to prepare adapter assets before load_corpus().""" return NoneAs per coding guidelines: "undocumented function definitions and class definitions in the project's Python code are assumed incomplete."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/base_benchmark_adapter.py` around lines 11 - 12, Add a one-line docstring to the new public async hook prepare_corpus(self) -> None clarifying that this is an optional asynchronous hook adapters should override to prepare or load corpus data and that the base implementation is a no-op (returns None); keep it short and placed immediately under the prepare_corpus signature in base_benchmark_adapter.py so implementations know to override it when needed.cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py (1)
82-89: Preserve the last world-generation failure.Retrying every
ValueErrorhere and then replacing it with a generic incompatibility message hides the real cause when generation fails for some other reason. Catch only the expected retryable error, or at least chain the last exception into the final one.As per coding guidelines, "Prefer explicit, structured error handling in Python code."Suggested change
def _create_world_with_packages(self, max_attempts: int = 10) -> dict[str, object]: + last_error: ValueError | None = None for _ in range(max_attempts): world = create_world( user_count=self.user_count, retailer_count=self.retailer_count, ) try: return add_packages_to_world(world, package_count=self.package_count) - except ValueError: + except ValueError as error: + last_error = error continue raise ValueError( "Could not generate a logistics world with compatible user and retailer pairs." - ) + ) from last_error🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py` around lines 82 - 89, The retry loop around add_packages_to_world currently swallows all ValueError instances and raises a generic ValueError, losing the root cause; modify the loop in logistics_system_adapter.py to either catch a more specific retryable exception (not broad ValueError) or store the last caught exception (e.g., last_exc) and re-raise the final ValueError chaining it with "from last_exc" so the original error from add_packages_to_world/package_count is preserved; update the except clause around add_packages_to_world(package_count=self.package_count) to implement one of these two options and ensure the final raise includes the original exception context.cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py (3)
92-133: Add a docstring tofrom_labelsand consider documenting error behavior.The classmethod lacks documentation. Additionally, calls to
ShippingRange.from_label(),PackageCategory.from_label(), etc., will presumably raise exceptions if passed invalid labels. While the current usage (via_seed_package) guarantees valid inputs frompackage_possible_values, documenting the expected behavior and valid inputs would help prevent misuse.📝 Suggested docstring
`@classmethod` def from_labels( cls, package_id: str, description: str, weight_kg: float, shipping_range: str, category: str, priority: str, retailer_id: str | None = None, retailer_name: str | None = None, user_id: str | None = None, user_name: str | None = None, insured: bool = False, current_state: str = PackageState.CREATED.label, last_known_location: str = "origin_warehouse", current_post_office_id: str | None = None, current_post_office_name: str | None = None, route_post_office_ids: tuple[str, ...] = (), route_post_office_names: tuple[str, ...] = (), status_history: tuple[str, ...] = (), ) -> "Package": + """Construct a Package from human-readable label strings. + + Args: + shipping_range: Label for ShippingRange enum (e.g., "local", "regional"). + category: Label for PackageCategory enum. + priority: Label for DeliveryPriority enum. + current_state: Label for PackageState enum. + **other args: Passed through directly to Package.__init__. + + Returns: + A new Package instance with enums resolved from labels. + + Raises: + ValueError: If any label does not match a valid enum member. + """ return cls(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py` around lines 92 - 133, Add a docstring to the Package.from_labels classmethod that describes its purpose, parameters, return type, and expected input labels; explicitly list or reference the valid label sources (e.g., package_possible_values or the enums) and document that invalid labels passed to ShippingRange.from_label, PackageCategory.from_label, DeliveryPriority.from_label, or PackageState.from_label will raise exceptions (propagated to the caller) so callers know error behavior. Reference the method name from_labels and the helper conversion functions ShippingRange.from_label, PackageCategory.from_label, DeliveryPriority.from_label, and PackageState.from_label in the docstring.
13-30: Consider adding a docstring or inline comment for the constant.The
package_possible_valuesdictionary serves as a configuration for valid attribute values, but its purpose and usage context aren't immediately clear. A brief docstring or comment would help future maintainers understand this is used for random sampling during package generation.As per coding guidelines, undocumented definitions are assumed incomplete.
📝 Suggested documentation
+# Valid attribute values for random package generation in logistics simulations. +# Keys map to Package field names; values are sampled via random.choice(). package_possible_values = { "weight_kg": [1.0, 5.0, 25.0, 150.0, 900.0],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py` around lines 13 - 30, Add a concise docstring or inline comment above the package_possible_values constant explaining its purpose and usage (e.g., that package_possible_values is a configuration mapping used for random sampling/validation when generating Package entities in the logistics system), describe what each key represents (weight_kg, shipping_range, category, priority, insured, current_state, last_known_location), and note that current_state excludes DELIVERED and RETURNED; place this comment immediately above the package_possible_values definition in package.py so maintainers see it when editing the Package generation logic.
33-52: Add a class-level docstring forPackage.Per the coding guidelines, undocumented class definitions are assumed incomplete. A docstring describing the purpose, key fields, and usage of this domain entity would improve maintainability and assist developers unfamiliar with the logistics simulation module.
📝 Suggested docstring
`@dataclass`(slots=True) class Package: + """Represents a package in the logistics simulation. + + Attributes: + package_id: Unique identifier for the package. + description: Human-readable description of package contents. + weight_kg: Package weight in kilograms. + shipping_range: Geographic scope of delivery (local, regional, etc.). + category: Classification affecting handling requirements. + priority: Delivery urgency level. + current_state: Current status in the delivery lifecycle. + route_post_office_ids: Ordered sequence of post offices in the delivery route. + """ package_id: str🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py` around lines 33 - 52, Add a clear class-level docstring to the Package dataclass describing its purpose in the logistics simulation, the key fields (e.g., package_id, description, weight_kg, shipping_range, category, priority, current_state, last_known_location, route_post_office_ids, status_history, and optional retailer/user info), typical usage (how instances are created/updated and how state/location fields change), and any important invariants (e.g., route_post_office_ids order corresponds to route_post_office_names). Place this docstring immediately below the `@dataclass`(slots=True) class Package: declaration so it documents the Package class for maintainers and tooling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py`:
- Around line 184-186: The adapter currently builds question_text from
package_context and question_prompt but ignores the stored custom query
(self.query) because the append logic is commented out; re-enable and guard that
append so when self.query is set and non-empty it gets appended (e.g., in the
method that constructs question_text, use a conditional to append
f"{self.query.strip()}" to question_text with a newline). Update the logic
around question_text, package_context, question_prompt and self.query so passing
query= in __init__ has effect and preserve spacing/newlines when appending.
- Around line 133-134: The _load_shared_corpus_documents method currently only
returns the contents of DEFAULT_COMMON_RULES_PATH, so the new policy rules file
never gets added to the corpus; update _load_shared_corpus_documents to also
read and include the policy rules (e.g.,
DEFAULT_POLICY_RULES_PATH.read_text(encoding="utf-8").strip()) alongside
DEFAULT_COMMON_RULES_PATH so both rule texts are returned (as a list of strings)
and become retrievable context.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.py`:
- Around line 230-239: The sync wrapper inverts the running-loop check and
attempts to create a nested loop; instead, detect a running loop with
asyncio.get_running_loop() and, if a loop exists, run narrativize_corpus in a
separate thread (e.g., use
concurrent.futures.ThreadPoolExecutor().submit(lambda:
asyncio.run(narrativize_corpus(...))).result()) so you avoid nesting event loops
in the same thread; if no running loop (get_running_loop() raises), simply call
asyncio.run(narrativize_corpus(...)). Ensure you reference the
narrativize_corpus coroutine and replace the current
new_event_loop/run_until_complete branch with the thread-based executor approach
when a loop is present.
In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/ontology.py`:
- Around line 46-83: The transport network is being built once from the initial
_seed_user/_seed_retailer pair and reused, so later retailers get
origin_post_office_id values that don't match actual origin facilities and
_route_post_offices_for_state() never sees those origins; fix by seeding post
offices and carriers per retailer/user pair instead of once: call
_seed_post_offices(...) and set retailer.origin_post_office_id via
_origin_post_office_id_for_retailer(...) for each retailer (or move the existing
post_offices assignment inside the retailers/users loop), and create carriers
with _seed_world_carriers(...) per retailer/user so each retailer has a matching
carrier and origin post office used by _route_post_offices_for_state().
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/queries/query_data_sources.txt`:
- Around line 13-21: The JSON schema uses non-JSON type names and inconsistent
key naming: replace the non-JSON tokens "int" and "float" for keys like
estimated_delivery_days and estimated_transport_price with the valid JSON type
"number", and normalize the route supporting-facts data-source key by renaming
route_supporting_facts_data_source to the plural
route_supporting_facts_data_sources so it matches the pattern used by
estimated_*_supporting_facts_data_sources.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/rule_engine.py`:
- Around line 198-211: compute_outcome currently picks the first retailer/user
from world instead of the package owner, causing wrong scoring for non-zero
index packages; after calling _resolve_package (which returns the package dict
with retailer_id and user_id), resolve the retailer and user using those ids
(e.g., pass package["retailer_id"] and package["user_id"] into
_resolve_world_actor or a lookup helper) and then call evaluate with the
resolved retailer and user so scoring uses the package's actual retailer/user;
update compute_outcome to use package-sourced ids rather than hardcoded
"retailers"/"users" defaults.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/rules/rules_policy_language.txt`:
- Line 10: Rule R10 requires region_distance = abs(origin.score -
destination.score) + 1 with +2 if crossing US/non‑US, but _region_distance()
currently returns 0 for same-region routes; update _region_distance in
cognee/eval_framework/benchmark_adapters/logistics_system_utils/rule_engine.py
to compute distance as abs(origin_region_score - destination_region_score) + 1
unconditionally and then add the +2 cross-border penalty when origin and
destination are US vs non‑US, and adjust any unit tests or callers that assumed
a zero same-region result.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/utils.py`:
- Around line 341-349: load_world currently omits the "packages" key when the
packages list is empty, breaking round-tripping with store_world (which always
writes "packages") and causing callers like write_golden_answers to fail; update
the load_world logic (the world dict construction in load_world) to always set
world["packages"] = packages regardless of emptiness so the key is preserved
even for an empty list.
- Around line 271-275: The helper currently falls back to the heuristic matcher
when package.user_id and package.retailer_id are present but one or both lookups
via _find_by_id(users, "user_id", package.user_id) or _find_by_id(retailers,
"retailer_id", package.retailer_id) fail; change this so that if package.user_id
and package.retailer_id are not None and either matched_user or matched_retailer
is None the function does not continue to the heuristic matcher but instead
returns a clear failure (e.g., raise a ValueError or return None/a specific
error sentinel) so corrupted IDs surface immediately rather than producing wrong
golden answers. Ensure the check lives alongside the existing
matched_user/matched_retailer logic and that callers handle the raised
error/None result.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py`:
- Line 170: The generated IDs using random.randint(...) (e.g., in create_world()
and places that set post_office_id, carrier IDs, etc., and that feed into
_entity_entries() and persisted filenames) are collision-prone; replace them
with collision-safe IDs by using uuid.uuid4().hex (or another UUID generator)
instead of random.randint, or implement a uniqueness loop that checks the
destination dict/sets (e.g., the map used by _entity_entries()) to ensure the
new ID is not already present before assigning; update all occurrences
referenced in create_world(), post_office_id, carrier ID generation sites, and
any other similar ID generators (the lines called out in the review) so
filenames and keys remain unique.
- Around line 661-678: _seed_package() currently assigns current_post_office to
the last routed office unconditionally, which causes IN_TRANSIT,
OUT_FOR_DELIVERY, and DELIVERED to show a sorting center instead of the special
labels from _location_for_state(); change the logic so that before building the
Package you set current_post_office = None when package_state is one of the
transit/delivery/completed states (e.g., package_state in
{PackageState.IN_TRANSIT, PackageState.OUT_FOR_DELIVERY,
PackageState.DELIVERED}) so that last_known_location and
current_post_office_name are computed from _location_for_state(package_state,
current_post_office) and do not point at a post office.
---
Nitpick comments:
In `@cognee/eval_framework/benchmark_adapters/base_benchmark_adapter.py`:
- Around line 11-12: Add a one-line docstring to the new public async hook
prepare_corpus(self) -> None clarifying that this is an optional asynchronous
hook adapters should override to prepare or load corpus data and that the base
implementation is a no-op (returns None); keep it short and placed immediately
under the prepare_corpus signature in base_benchmark_adapter.py so
implementations know to override it when needed.
In `@cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py`:
- Around line 82-89: The retry loop around add_packages_to_world currently
swallows all ValueError instances and raises a generic ValueError, losing the
root cause; modify the loop in logistics_system_adapter.py to either catch a
more specific retryable exception (not broad ValueError) or store the last
caught exception (e.g., last_exc) and re-raise the final ValueError chaining it
with "from last_exc" so the original error from
add_packages_to_world/package_count is preserved; update the except clause
around add_packages_to_world(package_count=self.package_count) to implement one
of these two options and ensure the final raise includes the original exception
context.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.py`:
- Around line 92-133: Add a docstring to the Package.from_labels classmethod
that describes its purpose, parameters, return type, and expected input labels;
explicitly list or reference the valid label sources (e.g.,
package_possible_values or the enums) and document that invalid labels passed to
ShippingRange.from_label, PackageCategory.from_label,
DeliveryPriority.from_label, or PackageState.from_label will raise exceptions
(propagated to the caller) so callers know error behavior. Reference the method
name from_labels and the helper conversion functions ShippingRange.from_label,
PackageCategory.from_label, DeliveryPriority.from_label, and
PackageState.from_label in the docstring.
- Around line 13-30: Add a concise docstring or inline comment above the
package_possible_values constant explaining its purpose and usage (e.g., that
package_possible_values is a configuration mapping used for random
sampling/validation when generating Package entities in the logistics system),
describe what each key represents (weight_kg, shipping_range, category,
priority, insured, current_state, last_known_location), and note that
current_state excludes DELIVERED and RETURNED; place this comment immediately
above the package_possible_values definition in package.py so maintainers see it
when editing the Package generation logic.
- Around line 33-52: Add a clear class-level docstring to the Package dataclass
describing its purpose in the logistics simulation, the key fields (e.g.,
package_id, description, weight_kg, shipping_range, category, priority,
current_state, last_known_location, route_post_office_ids, status_history, and
optional retailer/user info), typical usage (how instances are created/updated
and how state/location fields change), and any important invariants (e.g.,
route_post_office_ids order corresponds to route_post_office_names). Place this
docstring immediately below the `@dataclass`(slots=True) class Package:
declaration so it documents the Package class for maintainers and tooling.
In `@cognee/tests/unit/eval_framework/benchmark_adapters_test.py`:
- Around line 137-143: The test creates answers_by_type from qa_pairs[:3], which
couples the test to a specific ordering; update the construction of
answers_by_type to iterate over the entire qa_pairs list (qa_pairs) and build
the dict by mapping item["question_type"] to item so the assertions (on
answers_by_type, and checks for "delivery_days", "transport_cost", "carrier" and
their answer/golden_* fields) don't depend on the first three elements; ensure
duplicate question_type handling is intentional (e.g., last-wins) or assert
uniqueness if needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 99606914-d2f5-4b4f-b156-7abfcef484c0
📒 Files selected for processing (25)
cognee/eval_framework/benchmark_adapters/base_benchmark_adapter.pycognee/eval_framework/benchmark_adapters/benchmark_adapters.pycognee/eval_framework/benchmark_adapters/logistics_system_adapter.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/delivery_models.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/enums.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/package.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/post_office.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/retailer.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/transportation.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/entities/user.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/ontology.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/prompts/narrativization.txtcognee/eval_framework/benchmark_adapters/logistics_system_utils/prompts/package_narrativization.txtcognee/eval_framework/benchmark_adapters/logistics_system_utils/queries/query.txtcognee/eval_framework/benchmark_adapters/logistics_system_utils/queries/query_data_sources.txtcognee/eval_framework/benchmark_adapters/logistics_system_utils/rule_engine.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/rules/rules_common_language.txtcognee/eval_framework/benchmark_adapters/logistics_system_utils/rules/rules_policy_language.txtcognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/utils.pycognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.pycognee/eval_framework/corpus_builder/corpus_builder_executor.pycognee/tests/unit/eval_framework/benchmark_adapters_test.pycognee/tests/unit/eval_framework/corpus_builder_test.pycognee/tests/unit/eval_framework/logistics_ontology_test.py
| def _load_shared_corpus_documents(self) -> list[str]: | ||
| return [DEFAULT_COMMON_RULES_PATH.read_text(encoding="utf-8").strip()] |
There was a problem hiding this comment.
Load the new policy rules into the corpus.
load_corpus() only injects rules_common_language.txt, so the new rules_policy_language.txt never becomes retrievable context. Right now the added policy spec is dead data, and the corpus can drift away from the hard-coded evaluation logic.
Suggested change
DEFAULT_COMMON_RULES_PATH = (
Path(__file__).resolve().parent
/ "logistics_system_utils"
/ "rules"
/ "rules_common_language.txt"
)
+DEFAULT_POLICY_RULES_PATH = (
+ Path(__file__).resolve().parent
+ / "logistics_system_utils"
+ / "rules"
+ / "rules_policy_language.txt"
+)
...
def _load_shared_corpus_documents(self) -> list[str]:
- return [DEFAULT_COMMON_RULES_PATH.read_text(encoding="utf-8").strip()]
+ return [
+ DEFAULT_COMMON_RULES_PATH.read_text(encoding="utf-8").strip(),
+ DEFAULT_POLICY_RULES_PATH.read_text(encoding="utf-8").strip(),
+ ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py` around
lines 133 - 134, The _load_shared_corpus_documents method currently only returns
the contents of DEFAULT_COMMON_RULES_PATH, so the new policy rules file never
gets added to the corpus; update _load_shared_corpus_documents to also read and
include the policy rules (e.g.,
DEFAULT_POLICY_RULES_PATH.read_text(encoding="utf-8").strip()) alongside
DEFAULT_COMMON_RULES_PATH so both rule texts are returned (as a list of strings)
and become retrievable context.
| question_text = f"{package_context}\n\n{question_prompt}" | ||
| # if self.query.strip(): | ||
| # question_text = f"{question_text}\n{self.query.strip()}" |
There was a problem hiding this comment.
query is ignored right now.
__init__() still accepts and stores a custom query template, but the only place it was appended to the question text is commented out. Passing query= currently has no effect.
Suggested change
question_text = f"{package_context}\n\n{question_prompt}"
- # if self.query.strip():
- # question_text = f"{question_text}\n{self.query.strip()}"
+ if self.query.strip():
+ question_text = f"{question_text}\n{self.query.strip()}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| question_text = f"{package_context}\n\n{question_prompt}" | |
| # if self.query.strip(): | |
| # question_text = f"{question_text}\n{self.query.strip()}" | |
| question_text = f"{package_context}\n\n{question_prompt}" | |
| if self.query.strip(): | |
| question_text = f"{question_text}\n{self.query.strip()}" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cognee/eval_framework/benchmark_adapters/logistics_system_adapter.py` around
lines 184 - 186, The adapter currently builds question_text from package_context
and question_prompt but ignores the stored custom query (self.query) because the
append logic is commented out; re-enable and guard that append so when
self.query is set and non-empty it gets appended (e.g., in the method that
constructs question_text, use a conditional to append f"{self.query.strip()}" to
question_text with a newline). Update the logic around question_text,
package_context, question_prompt and self.query so passing query= in __init__
has effect and preserve spacing/newlines when appending.
| try: | ||
| asyncio.get_running_loop() | ||
| except RuntimeError: | ||
| return asyncio.run(narrativize_corpus(world_root_path, use_llm=use_llm)) | ||
|
|
||
| loop = asyncio.new_event_loop() | ||
| try: | ||
| return loop.run_until_complete(narrativize_corpus(world_root_path, use_llm=use_llm)) | ||
| finally: | ||
| loop.close() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Expected result:
# RuntimeError: Cannot run the event loop while another loop is running
python3 - <<'PY'
import asyncio
async def noop():
return None
async def probe():
loop = asyncio.new_event_loop()
coro = noop()
try:
try:
loop.run_until_complete(coro)
except Exception as exc:
print(type(exc).__name__, exc)
finally:
coro.close()
loop.close()
asyncio.run(probe())
PYRepository: topoteretes/cognee
Length of output: 130
🏁 Script executed:
# First, verify the file exists and check the exact code at lines 230-239
fd -t f narrativize_corpus.pyRepository: topoteretes/cognee
Length of output: 164
🏁 Script executed:
# Read the file to see lines around 230-239 with context
wc -l cognee/eval_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.pyRepository: topoteretes/cognee
Length of output: 168
🏁 Script executed:
# Read the specific section with context
sed -n '200,245p' cognee/eval_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.pyRepository: topoteretes/cognee
Length of output: 1301
This sync wrapper fails when called from an already-running event loop.
The logic is inverted: when asyncio.get_running_loop() succeeds (indicating a loop already exists), the code attempts to create a second loop and call run_until_complete() on it. This raises RuntimeError: Cannot run the event loop while another loop is running. In practice, this breaks the sync wrapper in notebooks, pytest-asyncio, and ASGI contexts where it should be most useful.
The correct pattern requires using a different strategy (e.g., asyncio.to_thread() for Python 3.9+, or nest_asyncio as a workaround) rather than attempting to nest event loops in the same thread.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/corpus_generator/narrativize_corpus.py`
around lines 230 - 239, The sync wrapper inverts the running-loop check and
attempts to create a nested loop; instead, detect a running loop with
asyncio.get_running_loop() and, if a loop exists, run narrativize_corpus in a
separate thread (e.g., use
concurrent.futures.ThreadPoolExecutor().submit(lambda:
asyncio.run(narrativize_corpus(...))).result()) so you avoid nesting event loops
in the same thread; if no running loop (get_running_loop() raises), simply call
asyncio.run(narrativize_corpus(...)). Ensure you reference the
narrativize_corpus coroutine and replace the current
new_event_loop/run_until_complete branch with the thread-based executor approach
when a loop is present.
| user = _seed_user(name=_pick_unique_name(USER_NAMES, used_user_names, "User")) | ||
| retailer = _seed_retailer( | ||
| shipping_range=user.default_shipping_range.label, | ||
| name=_pick_unique_name(RETAILER_NAMES, used_retailer_names, "Retailer"), | ||
| ) | ||
| post_offices = _seed_post_offices(user.default_shipping_range, retailer.region, user.region) | ||
| retailer.origin_post_office_id = _origin_post_office_id_for_retailer( | ||
| retailer.region, post_offices | ||
| ) | ||
|
|
||
| users = [user] | ||
| for _ in range(user_count - 1): | ||
| users.append( | ||
| _seed_user( | ||
| name=_pick_unique_name(USER_NAMES, used_user_names, "User"), | ||
| ) | ||
| ) | ||
|
|
||
| retailers = [retailer] | ||
| for _ in range(retailer_count - 1): | ||
| retailers.append( | ||
| _seed_retailer( | ||
| shipping_range=random.choice(retailer_possible_values["shipping_range"]), | ||
| origin_post_office_id=None, | ||
| name=_pick_unique_name(RETAILER_NAMES, used_retailer_names, "Retailer"), | ||
| ) | ||
| ) | ||
|
|
||
| for retailer in retailers: | ||
| retailer.origin_post_office_id = _origin_post_office_id_for_retailer( | ||
| retailer.region, post_offices | ||
| ) | ||
|
|
||
| carriers = _seed_world_carriers( | ||
| retailer=retailer, | ||
| user=user, | ||
| shipping_range=user.default_shipping_range, | ||
| ) |
There was a problem hiding this comment.
Don't build the whole transport network around the first seeded pair.
post_offices and carriers are seeded once from the initial user/retailer pair and then reused for every later retailer/user. That leaves many retailers with origin_post_office_ids that point to non-origin facilities, and cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py::_route_post_offices_for_state() only prefers warehouse IDs, so those configured origins are never reflected in the generated routes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cognee/eval_framework/benchmark_adapters/logistics_system_utils/ontology.py`
around lines 46 - 83, The transport network is being built once from the initial
_seed_user/_seed_retailer pair and reused, so later retailers get
origin_post_office_id values that don't match actual origin facilities and
_route_post_offices_for_state() never sees those origins; fix by seeding post
offices and carriers per retailer/user pair instead of once: call
_seed_post_offices(...) and set retailer.origin_post_office_id via
_origin_post_office_id_for_retailer(...) for each retailer (or move the existing
post_offices assignment inside the retailers/users loop), and create carriers
with _seed_world_carriers(...) per retailer/user so each retailer has a matching
carrier and origin post office used by _route_post_offices_for_state().
| "estimated_delivery_days": int, | ||
| "estimated_delivery_days_supporting_facts": ["string"], | ||
| "estimated_delivery_days_supporting_facts_data_sources": ["string"], | ||
| "estimated_transport_price": float, | ||
| "estimated_transport_price_supporting_facts": ["string"], | ||
| "estimated_transport_price_supporting_facts_data_sources": ["string"], | ||
| "route": "string", | ||
| "route_supporting_facts": ["string"], | ||
| "route_supporting_facts_data_source": ["string"] |
There was a problem hiding this comment.
JSON example can induce invalid output and a key mismatch.
The schema currently uses int/float (not valid JSON values) and mixes singular/plural data-source key naming for route facts.
Proposed fix
- "estimated_delivery_days": int,
+ "estimated_delivery_days": 0,
"estimated_delivery_days_supporting_facts": ["string"],
"estimated_delivery_days_supporting_facts_data_sources": ["string"],
- "estimated_transport_price": float,
+ "estimated_transport_price": 0.0,
"estimated_transport_price_supporting_facts": ["string"],
"estimated_transport_price_supporting_facts_data_sources": ["string"],
"route": "string",
"route_supporting_facts": ["string"],
- "route_supporting_facts_data_source": ["string"]
+ "route_supporting_facts_data_sources": ["string"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "estimated_delivery_days": int, | |
| "estimated_delivery_days_supporting_facts": ["string"], | |
| "estimated_delivery_days_supporting_facts_data_sources": ["string"], | |
| "estimated_transport_price": float, | |
| "estimated_transport_price_supporting_facts": ["string"], | |
| "estimated_transport_price_supporting_facts_data_sources": ["string"], | |
| "route": "string", | |
| "route_supporting_facts": ["string"], | |
| "route_supporting_facts_data_source": ["string"] | |
| "estimated_delivery_days": 0, | |
| "estimated_delivery_days_supporting_facts": ["string"], | |
| "estimated_delivery_days_supporting_facts_data_sources": ["string"], | |
| "estimated_transport_price": 0.0, | |
| "estimated_transport_price_supporting_facts": ["string"], | |
| "estimated_transport_price_supporting_facts_data_sources": ["string"], | |
| "route": "string", | |
| "route_supporting_facts": ["string"], | |
| "route_supporting_facts_data_sources": ["string"] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/queries/query_data_sources.txt`
around lines 13 - 21, The JSON schema uses non-JSON type names and inconsistent
key naming: replace the non-JSON tokens "int" and "float" for keys like
estimated_delivery_days and estimated_transport_price with the valid JSON type
"number", and normalize the route supporting-facts data-source key by renaming
route_supporting_facts_data_source to the plural
route_supporting_facts_data_sources so it matches the pattern used by
estimated_*_supporting_facts_data_sources.
| R07. An express package shall not be assigned to the shipment transport mode. | ||
| R08. A package already delivered to the recipient shall not receive a new transport assignment. | ||
| R09. Retailer and user regions shall contribute to both delivery-time estimation and transport-price estimation. | ||
| R10. Region distance shall be computed as the absolute difference between origin and destination region distance scores, plus 1, with an additional penalty of 2 when crossing between US and non-US regions. |
There was a problem hiding this comment.
R10 disagrees with the engine on same-region routes.
_region_distance() in cognee/eval_framework/benchmark_adapters/logistics_system_utils/rule_engine.py returns 0 when origin and destination are the same region, but this rule says abs(...) + 1 unconditionally. That makes the published rule add 1 day and $3 to same-region shipments that the generated golden answers will not.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/rules/rules_policy_language.txt`
at line 10, Rule R10 requires region_distance = abs(origin.score -
destination.score) + 1 with +2 if crossing US/non‑US, but _region_distance()
currently returns 0 for same-region routes; update _region_distance in
cognee/eval_framework/benchmark_adapters/logistics_system_utils/rule_engine.py
to compute distance as abs(origin_region_score - destination_region_score) + 1
unconditionally and then add the +2 cross-border penalty when origin and
destination are US vs non‑US, and adjust any unit tests or callers that assumed
a zero same-region result.
| if package.user_id is not None and package.retailer_id is not None: | ||
| matched_user = _find_by_id(users, "user_id", package.user_id) | ||
| matched_retailer = _find_by_id(retailers, "retailer_id", package.retailer_id) | ||
| if matched_user is not None and matched_retailer is not None: | ||
| return matched_user, matched_retailer |
There was a problem hiding this comment.
Don't silently remap packages with broken IDs.
Once a package carries explicit user_id and retailer_id, this helper should fail if either lookup misses. Falling back to the heuristic matcher here will generate golden answers for the wrong entities instead of surfacing corrupted world data.
🔧 Suggested fix
if package.user_id is not None and package.retailer_id is not None:
matched_user = _find_by_id(users, "user_id", package.user_id)
matched_retailer = _find_by_id(retailers, "retailer_id", package.retailer_id)
- if matched_user is not None and matched_retailer is not None:
- return matched_user, matched_retailer
+ if matched_user is None or matched_retailer is None:
+ raise ValueError(
+ f"Package {package.package_id} references an unknown user or retailer."
+ )
+ return matched_user, matched_retailer📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if package.user_id is not None and package.retailer_id is not None: | |
| matched_user = _find_by_id(users, "user_id", package.user_id) | |
| matched_retailer = _find_by_id(retailers, "retailer_id", package.retailer_id) | |
| if matched_user is not None and matched_retailer is not None: | |
| return matched_user, matched_retailer | |
| if package.user_id is not None and package.retailer_id is not None: | |
| matched_user = _find_by_id(users, "user_id", package.user_id) | |
| matched_retailer = _find_by_id(retailers, "retailer_id", package.retailer_id) | |
| if matched_user is None or matched_retailer is None: | |
| raise ValueError( | |
| f"Package {package.package_id} references an unknown user or retailer." | |
| ) | |
| return matched_user, matched_retailer |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/utils.py`
around lines 271 - 275, The helper currently falls back to the heuristic matcher
when package.user_id and package.retailer_id are present but one or both lookups
via _find_by_id(users, "user_id", package.user_id) or _find_by_id(retailers,
"retailer_id", package.retailer_id) fail; change this so that if package.user_id
and package.retailer_id are not None and either matched_user or matched_retailer
is None the function does not continue to the heuristic matcher but instead
returns a clear failure (e.g., raise a ValueError or return None/a specific
error sentinel) so corrupted IDs surface immediately rather than producing wrong
golden answers. Ensure the check lives alongside the existing
matched_user/matched_retailer logic and that callers handle the raised
error/None result.
| world: dict[str, object] = { | ||
| "retailers": retailers, | ||
| "users": users, | ||
| "post_offices": post_offices, | ||
| "carriers": carriers, | ||
| } | ||
|
|
||
| if packages: | ||
| world["packages"] = packages |
There was a problem hiding this comment.
Always preserve packages when loading the world.
store_world() always writes a packages array, but load_world() drops that key when the array is empty. That breaks round-tripping and makes downstream code such as write_golden_answers() crash on valid empty worlds.
🔧 Suggested fix
world: dict[str, object] = {
"retailers": retailers,
"users": users,
+ "packages": packages,
"post_offices": post_offices,
"carriers": carriers,
}
-
- if packages:
- world["packages"] = packages
return world📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| world: dict[str, object] = { | |
| "retailers": retailers, | |
| "users": users, | |
| "post_offices": post_offices, | |
| "carriers": carriers, | |
| } | |
| if packages: | |
| world["packages"] = packages | |
| world: dict[str, object] = { | |
| "retailers": retailers, | |
| "users": users, | |
| "packages": packages, | |
| "post_offices": post_offices, | |
| "carriers": carriers, | |
| } | |
| return world |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/utils.py`
around lines 341 - 349, load_world currently omits the "packages" key when the
packages list is empty, breaking round-tripping with store_world (which always
writes "packages") and causing callers like write_golden_answers to fail; update
the load_world logic (the world dict construction in load_world) to always set
world["packages"] = packages regardless of emptiness so the key is preserved
even for an empty list.
| supports_hazardous_materials: bool, | ||
| ) -> PostOffice: | ||
| return PostOffice( | ||
| post_office_id=f"po-{random.randint(1000, 9999)}", |
There was a problem hiding this comment.
Make generated IDs collision-safe.
All generated IDs come from unchecked random.randint(...). Carriers are the worst case here: create_world() seeds 22 of them from a 900-value suffix space, so duplicate IDs are realistic. A collision will overwrite _entity_entries() keys, reuse filenames, and make references ambiguous across the persisted world.
Also applies to: 540-540, 597-597, 626-626, 664-664
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py`
at line 170, The generated IDs using random.randint(...) (e.g., in
create_world() and places that set post_office_id, carrier IDs, etc., and that
feed into _entity_entries() and persisted filenames) are collision-prone;
replace them with collision-safe IDs by using uuid.uuid4().hex (or another UUID
generator) instead of random.randint, or implement a uniqueness loop that checks
the destination dict/sets (e.g., the map used by _entity_entries()) to ensure
the new ID is not already present before assigning; update all occurrences
referenced in create_world(), post_office_id, carrier ID generation sites, and
any other similar ID generators (the lines called out in the review) so
filenames and keys remain unique.
| current_post_office = visited_post_offices[-1] if visited_post_offices else None | ||
|
|
||
| return Package.from_labels( | ||
| package_id=f"pkg-{random.randint(10000, 99999)}", | ||
| description=description or random.choice(PACKAGE_DESCRIPTIONS), | ||
| weight_kg=random.choice(package_possible_values["weight_kg"]), | ||
| shipping_range=shipping_range, | ||
| category=category, | ||
| priority=priority, | ||
| retailer_id=retailer_id, | ||
| retailer_name=retailer_name, | ||
| user_id=user_id, | ||
| user_name=user_name, | ||
| insured=random.choice(package_possible_values["insured"]), | ||
| current_state=current_state, | ||
| last_known_location=_location_for_state(package_state, current_post_office), | ||
| current_post_office_id=current_post_office.post_office_id if current_post_office else None, | ||
| current_post_office_name=current_post_office.name if current_post_office else None, |
There was a problem hiding this comment.
Delivered and in-flight states shouldn't point at a post office.
_seed_package() always sets current_post_office to the last routed office. For IN_TRANSIT, OUT_FOR_DELIVERY, and DELIVERED, that makes last_known_location/current_post_office_name report a sorting center, and _location_for_state()'s special labels never get used.
🔧 Suggested fix
current_post_office = visited_post_offices[-1] if visited_post_offices else None
+ if package_state in {
+ PackageState.IN_TRANSIT,
+ PackageState.OUT_FOR_DELIVERY,
+ PackageState.DELIVERED,
+ }:
+ current_post_office = None
return Package.from_labels(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@cognee/eval_framework/benchmark_adapters/logistics_system_utils/utils/world_creation_utils.py`
around lines 661 - 678, _seed_package() currently assigns current_post_office to
the last routed office unconditionally, which causes IN_TRANSIT,
OUT_FOR_DELIVERY, and DELIVERED to show a sorting center instead of the special
labels from _location_for_state(); change the logic so that before building the
Package you set current_post_office = None when package_state is one of the
transit/delivery/completed states (e.g., package_state in
{PackageState.IN_TRANSIT, PackageState.OUT_FOR_DELIVERY,
PackageState.DELIVERED}) so that last_known_location and
current_post_office_name are computed from _location_for_state(package_state,
current_post_office) and do not point at a post office.
Description
Acceptance Criteria
Type of Change
Screenshots
Pre-submission Checklist
CONTRIBUTING.md)DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Summary by CodeRabbit
Release Notes
New Features
Tests