feat(api): add input validation to search, cognify, and add routes#2336
feat(api): add input validation to search, cognify, and add routes#2336mdehsan873 wants to merge 1 commit intotopoteretes:devfrom
Conversation
WalkthroughThe PR improves API validation across three endpoints by converting a generic error to HTTP 400, adding field constraints with min/max bounds to DTOs, and introducing a field validator for query string whitespace normalization. A corresponding test verifies the new error handling behavior. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cognee/tests/unit/api/test_conditional_authentication_endpoints.py (1)
124-126: Consider using atry/finallyblock for cleanup.If the assertion on line 124 or 125 fails, the dependency override on line 118 will not be cleaned up (line 126 won't execute). This can leak state and cause flaky behavior in subsequent tests.
♻️ Proposed fix using try/finally
app.dependency_overrides[get_authenticated_user] = override_get_authenticated_user files = {"data": ("test.txt", b"test content", "text/plain")} form_data = {} # No datasetName, no datasetId - response = client.post("/api/v1/add", files=files, data=form_data) - - assert response.status_code == 400 - assert response.json()["detail"] == "Either datasetId or datasetName must be provided." - app.dependency_overrides.pop(get_authenticated_user, None) + try: + response = client.post("/api/v1/add", files=files, data=form_data) + assert response.status_code == 400 + assert response.json()["detail"] == "Either datasetId or datasetName must be provided." + finally: + app.dependency_overrides.pop(get_authenticated_user, None)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cognee/tests/unit/api/test_conditional_authentication_endpoints.py` around lines 124 - 126, The test leaks a dependency override when an assertion fails because app.dependency_overrides.pop(get_authenticated_user, None) is called unconditionally after assertions; wrap the assertions and any code that relies on the override in a try/finally so the cleanup always runs. Specifically, in the test function containing the get_authenticated_user override, move the assertions into a try block and call app.dependency_overrides.pop(get_authenticated_user, None) in the finally block to ensure the override is removed even if an assertion fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cognee/tests/unit/api/test_conditional_authentication_endpoints.py`:
- Around line 108-126: The test imports get_authenticated_user from
get_add_router instead of its canonical definition; change the import to pull
get_authenticated_user from its canonical module (the
cognee.modules.users.methods function get_authenticated_user) so it matches
other tests, update the import statement in the
test_add_endpoint_returns_400_when_neither_dataset_id_nor_name_provided test to
use get_authenticated_user from cognee.modules.users.methods, and keep the
existing override and dependency_overrides cleanup logic unchanged.
---
Nitpick comments:
In `@cognee/tests/unit/api/test_conditional_authentication_endpoints.py`:
- Around line 124-126: The test leaks a dependency override when an assertion
fails because app.dependency_overrides.pop(get_authenticated_user, None) is
called unconditionally after assertions; wrap the assertions and any code that
relies on the override in a try/finally so the cleanup always runs.
Specifically, in the test function containing the get_authenticated_user
override, move the assertions into a try block and call
app.dependency_overrides.pop(get_authenticated_user, None) in the finally block
to ensure the override is removed even if an assertion fails.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7d650193-c15b-4b41-8b2d-f59dbabfc41c
📒 Files selected for processing (4)
cognee/api/v1/add/routers/get_add_router.pycognee/api/v1/cognify/routers/get_cognify_router.pycognee/api/v1/search/routers/get_search_router.pycognee/tests/unit/api/test_conditional_authentication_endpoints.py
cognee/tests/unit/api/test_conditional_authentication_endpoints.py
Outdated
Show resolved
Hide resolved
- Search: query min_length=1, max_length=10000, strip whitespace; top_k ge=1, le=1000 - Cognify: chunks_per_batch ge=1, le=10000 when provided - Add: replace ValueError with HTTPException(400) when datasetId/datasetName missing - Add test for add route 400 when neither datasetId nor datasetName provided Closes topoteretes#2309 Made-with: Cursor
22eed0b to
f33c606
Compare
Please make sure all the checkboxes are checked:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
cognee/api/v1/search/routers/get_search_router.py (1)
44-49: Add a brief docstring for the validator.This helper is clear, but a one-line docstring would align with the project’s documentation expectation for function definitions.
As per coding guidelines, "As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cognee/api/v1/search/routers/get_search_router.py` around lines 44 - 49, Add a one-line docstring to the validator function strip_query to describe its purpose (e.g., "Strip leading/trailing whitespace from the query field before validation."). Update the function definition for strip_query (the method decorated with `@field_validator`("query", mode="before")) to include that brief summary as a docstring directly under the def line, keeping behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cognee/api/v1/search/routers/get_search_router.py`:
- Around line 38-40: The top_k field currently declares type Optional[int],
allowing null to bypass the numeric bounds; change its annotation to int (keep
default=10, ge=1, le=1000, description unchanged) so Pydantic enforces the
1..1000 constraint; also remove or update any unused Optional import if present
and ensure the model/endpoint that uses top_k expects an int rather than None
(look for the top_k Field definition in get_search_router.py).
In `@cognee/tests/api/test_conditional_authentication_endpoints.py`:
- Around line 121-130: The test sets
app.dependency_overrides[get_authenticated_user] =
override_get_authenticated_user but removes it only after assertions, so
failures leak state; wrap the override setup and the client.post/assert block in
a try/finally and move app.dependency_overrides.pop(get_authenticated_user,
None) into the finally so the override is always removed (reference
get_authenticated_user, override_get_authenticated_user,
app.dependency_overrides.pop, and the client.post("/api/v1/add") call).
---
Nitpick comments:
In `@cognee/api/v1/search/routers/get_search_router.py`:
- Around line 44-49: Add a one-line docstring to the validator function
strip_query to describe its purpose (e.g., "Strip leading/trailing whitespace
from the query field before validation."). Update the function definition for
strip_query (the method decorated with `@field_validator`("query", mode="before"))
to include that brief summary as a docstring directly under the def line,
keeping behavior unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2e66fee6-fa30-4bd1-b1e6-63cc9e7a35a6
📒 Files selected for processing (4)
cognee/api/v1/add/routers/get_add_router.pycognee/api/v1/cognify/routers/get_cognify_router.pycognee/api/v1/search/routers/get_search_router.pycognee/tests/api/test_conditional_authentication_endpoints.py
✅ Files skipped from review due to trivial changes (1)
- cognee/api/v1/cognify/routers/get_cognify_router.py
🚧 Files skipped from review as they are similar to previous changes (1)
- cognee/api/v1/add/routers/get_add_router.py
| top_k: Optional[int] = Field( | ||
| default=10, ge=1, le=1000, description="Maximum number of results to return." | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Pydantic v2, for a field typed Optional[int] with Field(ge=1, le=1000), is a JSON null value accepted, and are ge/le constraints applied to null?
💡 Result:
In Pydantic v2, for a field typed Optional[int] (equivalent to int | None) with Field(ge=1, le=1000), a JSON null value is accepted because null deserializes to Python None, which matches the None variant of the union type. The ge and le numeric constraints are not applied to null/None because validation selects the None branch of the union, bypassing the int validator and its constraints. The constraints only apply if the int branch is selected (i.e., a non-None value that validates as int within 1-1000). This behavior follows Pydantic's union validation, where Optional[T] means the field is required but accepts None or a value matching T with its constraints. Official docs confirm Field(ge=, le=) apply to numbers, and union modes like 'smart' select the matching variant without applying constraints from other branches (sources 1,2,8,10,11,13,15,16,24,27). Stack Overflow and GitHub issues illustrate Optional accepts None without constraint violations on None (sources 4,6). No sources indicate constraints apply to None; numeric validators expect numbers, and None succeeds the union without reaching them.
Citations:
- 1: https://docs.pydantic.dev/latest/concepts/fields/
- 2: https://docs.pydantic.dev/2.5/api/fields/
- 3: https://stackoverflow.com/questions/76690463/pydantic-2-0-ignores-optional-in-schema-and-requires-the-field-to-be-available
- 4: https://stackoverflow.com/questions/77348076/optional-str-in-pydantic-2-0-with-field-constraints
- 5: https://docs.pydantic.dev/2.6/concepts/types/
- 6: https://docs.pydantic.dev/2.12/api/standard_library_types/
- 7: https://docs.pydantic.dev/2.10/concepts/fields/
- 8: https://docs.pydantic.dev/2.10/concepts/fields
- 9: https://docs.pydantic.dev/latest/api/standard_library_types
- 10: https://docs.pydantic.dev/2.0/usage/types/number_types/
- 11: https://docs.pydantic.dev/2.6/api/standard_library_types/
- 12: https://docs.pydantic.dev/2.3/usage/types/unions/
top_k must not be Optional to enforce the bounded-integer contract.
With Optional[int], clients can send {"top_k": null} and the field will accept it; the ge=1, le=1000 constraints only apply when an actual integer is provided. In Pydantic v2, Optional[T] means the field accepts None or a value of type T (with its constraints), but None bypasses the numeric validators entirely. If the API requires an integer between 1 and 1000, remove the Optional:
Proposed fix
- top_k: Optional[int] = Field(
+ top_k: int = Field(
default=10, ge=1, le=1000, description="Maximum number of results to return."
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cognee/api/v1/search/routers/get_search_router.py` around lines 38 - 40, The
top_k field currently declares type Optional[int], allowing null to bypass the
numeric bounds; change its annotation to int (keep default=10, ge=1, le=1000,
description unchanged) so Pydantic enforces the 1..1000 constraint; also remove
or update any unused Optional import if present and ensure the model/endpoint
that uses top_k expects an int rather than None (look for the top_k Field
definition in get_search_router.py).
| app.dependency_overrides[get_authenticated_user] = override_get_authenticated_user | ||
| files = {"data": ("test.txt", b"test content", "text/plain")} | ||
| form_data = {} # No datasetName, no datasetId | ||
|
|
||
| response = client.post("/api/v1/add", files=files, data=form_data) | ||
|
|
||
| assert response.status_code == 400 | ||
| assert response.json()["detail"] == "Either datasetId or datasetName must be provided." | ||
| app.dependency_overrides.pop(get_authenticated_user, None) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify override setup/cleanup placement in the changed test.
rg -n -C3 'dependency_overrides\[' cognee/tests/api/test_conditional_authentication_endpoints.py
rg -n -C3 'dependency_overrides\.pop\(' cognee/tests/api/test_conditional_authentication_endpoints.pyRepository: topoteretes/cognee
Length of output: 756
🏁 Script executed:
#!/bin/bash
# Check test class structure and setup/teardown
rg -n -B5 'class.*Test' cognee/tests/api/test_conditional_authentication_endpoints.py | head -30
# Count all dependency_overrides patterns in the file
echo "=== All dependency_overrides patterns ==="
rg -n 'dependency_overrides' cognee/tests/api/test_conditional_authentication_endpoints.py
# Check if there are pytest fixtures or setup/teardown methods
echo "=== Setup/teardown patterns ==="
rg -n '(setUp|tearDown|@pytest\.fixture|def setup|def teardown)' cognee/tests/api/test_conditional_authentication_endpoints.pyRepository: topoteretes/cognee
Length of output: 1391
🏁 Script executed:
#!/bin/bash
# Read the full test method context
sed -n '111,135p' cognee/tests/api/test_conditional_authentication_endpoints.py
# Check what the fixtures do
echo "=== Fixtures at line 47 and 161 ==="
sed -n '47,65p' cognee/tests/api/test_conditional_authentication_endpoints.py
sed -n '161,180p' cognee/tests/api/test_conditional_authentication_endpoints.pyRepository: topoteretes/cognee
Length of output: 2760
Wrap dependency override setup/cleanup in try/finally to prevent state leakage on test failure.
At line 121, the dependency override is set but cleanup at line 129 is skipped if assertions at lines 127–128 fail. This leaves the global override active for subsequent tests. Restructure to guarantee cleanup:
app.dependency_overrides[get_authenticated_user] = override_get_authenticated_user
- files = {"data": ("test.txt", b"test content", "text/plain")}
- form_data = {} # No datasetName, no datasetId
-
- response = client.post("/api/v1/add", files=files, data=form_data)
-
- assert response.status_code == 400
- assert response.json()["detail"] == "Either datasetId or datasetName must be provided."
- app.dependency_overrides.pop(get_authenticated_user, None)
+ try:
+ files = {"data": ("test.txt", b"test content", "text/plain")}
+ form_data = {} # No datasetName, no datasetId
+ response = client.post("/api/v1/add", files=files, data=form_data)
+
+ assert response.status_code == 400
+ assert response.json()["detail"] == "Either datasetId or datasetName must be provided."
+ finally:
+ app.dependency_overrides.pop(get_authenticated_user, None)This aligns with the coding guideline: avoid external state in tests and rely on proper cleanup mechanisms.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cognee/tests/api/test_conditional_authentication_endpoints.py` around lines
121 - 130, The test sets app.dependency_overrides[get_authenticated_user] =
override_get_authenticated_user but removes it only after assertions, so
failures leak state; wrap the override setup and the client.post/assert block in
a try/finally and move app.dependency_overrides.pop(get_authenticated_user,
None) into the finally so the override is always removed (reference
get_authenticated_user, override_get_authenticated_user,
app.dependency_overrides.pop, and the client.post("/api/v1/add") call).
Description
Add input validation to API route parameters (#2309)
Search: query min/max length (1–10000), strip whitespace; top_k 1–1000.
Cognify: chunks_per_batch 1–10000 when provided.
Add: Use HTTPException(400) when neither datasetId nor datasetName is provided (instead of ValueError).
Adds a test for the add-route 400 case. Ruff passes.
Acceptance Criteria
Type of Change
Screenshots
Pre-submission Checklist
CONTRIBUTING.md)DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Summary by CodeRabbit
Bug Fixes
Improvements