Skip to content

feat(api): add input validation to search, cognify, and add routes#2336

Open
mdehsan873 wants to merge 1 commit intotopoteretes:devfrom
mdehsan873:feat/input-validation-api-route-params
Open

feat(api): add input validation to search, cognify, and add routes#2336
mdehsan873 wants to merge 1 commit intotopoteretes:devfrom
mdehsan873:feat/input-validation-api-route-params

Conversation

@mdehsan873
Copy link
Copy Markdown
Contributor

@mdehsan873 mdehsan873 commented Mar 9, 2026

Description

Add input validation to API route parameters (#2309)
Search: query min/max length (1–10000), strip whitespace; top_k 1–1000.
Cognify: chunks_per_batch 1–10000 when provided.
Add: Use HTTPException(400) when neither datasetId nor datasetName is provided (instead of ValueError).
Adds a test for the add-route 400 case. Ruff passes.

Acceptance Criteria

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Code refactoring
  • Other (please specify):

Screenshots

Screenshot 2026-03-09 at 1 44 55 PM

Pre-submission Checklist

  • I have tested my changes thoroughly before submitting this PR (See CONTRIBUTING.md)
  • This PR contains minimal changes necessary to address the issue/feature
  • My code follows the project's coding standards and style guidelines
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if applicable)
  • All new and existing tests pass
  • I have searched existing PRs to ensure this change hasn't been submitted already
  • I have linked any relevant issues in the description
  • My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Summary by CodeRabbit

  • Bug Fixes

    • Missing dataset information now returns a clear HTTP 400 error instead of a generic exception.
    • Added input validation for batch processing parameters with enforced ranges.
  • Improvements

    • Search queries now validate length constraints and automatically trim whitespace.
    • Search result limits now have defined boundaries.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 9, 2026

Walkthrough

The PR improves API validation across three endpoints by converting a generic error to HTTP 400, adding field constraints with min/max bounds to DTOs, and introducing a field validator for query string whitespace normalization. A corresponding test verifies the new error handling behavior.

Changes

Cohort / File(s) Summary
Error Handling
cognee/api/v1/add/routers/get_add_router.py
Changed missing datasetId and datasetName validation to raise HTTPException with status code 400 instead of generic ValueError, enabling proper HTTP error response handling.
DTO Field Constraints
cognee/api/v1/cognify/routers/get_cognify_router.py, cognee/api/v1/search/routers/get_search_router.py
Added field validation constraints to chunks_per_batch (range 1–10000), query (length 1–10000), and top_k (range 1–1000). Added @field_validator with whitespace-stripping logic for query field.
Test Coverage
cognee/tests/api/test_conditional_authentication_endpoints.py
Added test case verifying HTTP 400 response and error message when POST /api/v1/add is called without datasetId or datasetName.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • PR #376: Modifies the same CognifyPayloadDTO class in cognee/api/v1/cognify/routers/get_cognify_router.py, with potential overlapping field changes requiring conflict resolution or coordination.

Suggested labels

run-checks

Suggested reviewers

  • borisarzentar
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(api): add input validation to search, cognify, and add routes' is clear, specific, and directly summarizes the main change across all modified API route files.
Description check ✅ Passed The PR description includes clear, human-generated explanations of changes, references issue #2309, includes screenshots of passing tests, and completes the pre-submission checklist.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cognee/tests/unit/api/test_conditional_authentication_endpoints.py (1)

124-126: Consider using a try/finally block for cleanup.

If the assertion on line 124 or 125 fails, the dependency override on line 118 will not be cleaned up (line 126 won't execute). This can leak state and cause flaky behavior in subsequent tests.

♻️ Proposed fix using try/finally
         app.dependency_overrides[get_authenticated_user] = override_get_authenticated_user
         files = {"data": ("test.txt", b"test content", "text/plain")}
         form_data = {}  # No datasetName, no datasetId

-        response = client.post("/api/v1/add", files=files, data=form_data)
-
-        assert response.status_code == 400
-        assert response.json()["detail"] == "Either datasetId or datasetName must be provided."
-        app.dependency_overrides.pop(get_authenticated_user, None)
+        try:
+            response = client.post("/api/v1/add", files=files, data=form_data)
+            assert response.status_code == 400
+            assert response.json()["detail"] == "Either datasetId or datasetName must be provided."
+        finally:
+            app.dependency_overrides.pop(get_authenticated_user, None)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/tests/unit/api/test_conditional_authentication_endpoints.py` around
lines 124 - 126, The test leaks a dependency override when an assertion fails
because app.dependency_overrides.pop(get_authenticated_user, None) is called
unconditionally after assertions; wrap the assertions and any code that relies
on the override in a try/finally so the cleanup always runs. Specifically, in
the test function containing the get_authenticated_user override, move the
assertions into a try block and call
app.dependency_overrides.pop(get_authenticated_user, None) in the finally block
to ensure the override is removed even if an assertion fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cognee/tests/unit/api/test_conditional_authentication_endpoints.py`:
- Around line 108-126: The test imports get_authenticated_user from
get_add_router instead of its canonical definition; change the import to pull
get_authenticated_user from its canonical module (the
cognee.modules.users.methods function get_authenticated_user) so it matches
other tests, update the import statement in the
test_add_endpoint_returns_400_when_neither_dataset_id_nor_name_provided test to
use get_authenticated_user from cognee.modules.users.methods, and keep the
existing override and dependency_overrides cleanup logic unchanged.

---

Nitpick comments:
In `@cognee/tests/unit/api/test_conditional_authentication_endpoints.py`:
- Around line 124-126: The test leaks a dependency override when an assertion
fails because app.dependency_overrides.pop(get_authenticated_user, None) is
called unconditionally after assertions; wrap the assertions and any code that
relies on the override in a try/finally so the cleanup always runs.
Specifically, in the test function containing the get_authenticated_user
override, move the assertions into a try block and call
app.dependency_overrides.pop(get_authenticated_user, None) in the finally block
to ensure the override is removed even if an assertion fails.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7d650193-c15b-4b41-8b2d-f59dbabfc41c

📥 Commits

Reviewing files that changed from the base of the PR and between f7ba5db and 22eed0b.

📒 Files selected for processing (4)
  • cognee/api/v1/add/routers/get_add_router.py
  • cognee/api/v1/cognify/routers/get_cognify_router.py
  • cognee/api/v1/search/routers/get_search_router.py
  • cognee/tests/unit/api/test_conditional_authentication_endpoints.py

siillee
siillee previously approved these changes Mar 20, 2026
- Search: query min_length=1, max_length=10000, strip whitespace; top_k ge=1, le=1000
- Cognify: chunks_per_batch ge=1, le=10000 when provided
- Add: replace ValueError with HTTPException(400) when datasetId/datasetName missing
- Add test for add route 400 when neither datasetId nor datasetName provided

Closes topoteretes#2309

Made-with: Cursor
@Vasilije1990 Vasilije1990 force-pushed the feat/input-validation-api-route-params branch from 22eed0b to f33c606 Compare April 5, 2026 03:01
@pull-checklist
Copy link
Copy Markdown

pull-checklist bot commented Apr 5, 2026

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cognee/api/v1/search/routers/get_search_router.py (1)

44-49: Add a brief docstring for the validator.

This helper is clear, but a one-line docstring would align with the project’s documentation expectation for function definitions.

As per coding guidelines, "As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/api/v1/search/routers/get_search_router.py` around lines 44 - 49, Add
a one-line docstring to the validator function strip_query to describe its
purpose (e.g., "Strip leading/trailing whitespace from the query field before
validation."). Update the function definition for strip_query (the method
decorated with `@field_validator`("query", mode="before")) to include that brief
summary as a docstring directly under the def line, keeping behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cognee/api/v1/search/routers/get_search_router.py`:
- Around line 38-40: The top_k field currently declares type Optional[int],
allowing null to bypass the numeric bounds; change its annotation to int (keep
default=10, ge=1, le=1000, description unchanged) so Pydantic enforces the
1..1000 constraint; also remove or update any unused Optional import if present
and ensure the model/endpoint that uses top_k expects an int rather than None
(look for the top_k Field definition in get_search_router.py).

In `@cognee/tests/api/test_conditional_authentication_endpoints.py`:
- Around line 121-130: The test sets
app.dependency_overrides[get_authenticated_user] =
override_get_authenticated_user but removes it only after assertions, so
failures leak state; wrap the override setup and the client.post/assert block in
a try/finally and move app.dependency_overrides.pop(get_authenticated_user,
None) into the finally so the override is always removed (reference
get_authenticated_user, override_get_authenticated_user,
app.dependency_overrides.pop, and the client.post("/api/v1/add") call).

---

Nitpick comments:
In `@cognee/api/v1/search/routers/get_search_router.py`:
- Around line 44-49: Add a one-line docstring to the validator function
strip_query to describe its purpose (e.g., "Strip leading/trailing whitespace
from the query field before validation."). Update the function definition for
strip_query (the method decorated with `@field_validator`("query", mode="before"))
to include that brief summary as a docstring directly under the def line,
keeping behavior unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2e66fee6-fa30-4bd1-b1e6-63cc9e7a35a6

📥 Commits

Reviewing files that changed from the base of the PR and between 22eed0b and f33c606.

📒 Files selected for processing (4)
  • cognee/api/v1/add/routers/get_add_router.py
  • cognee/api/v1/cognify/routers/get_cognify_router.py
  • cognee/api/v1/search/routers/get_search_router.py
  • cognee/tests/api/test_conditional_authentication_endpoints.py
✅ Files skipped from review due to trivial changes (1)
  • cognee/api/v1/cognify/routers/get_cognify_router.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • cognee/api/v1/add/routers/get_add_router.py

Comment on lines +38 to +40
top_k: Optional[int] = Field(
default=10, ge=1, le=1000, description="Maximum number of results to return."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Pydantic v2, for a field typed Optional[int] with Field(ge=1, le=1000), is a JSON null value accepted, and are ge/le constraints applied to null?

💡 Result:

In Pydantic v2, for a field typed Optional[int] (equivalent to int | None) with Field(ge=1, le=1000), a JSON null value is accepted because null deserializes to Python None, which matches the None variant of the union type. The ge and le numeric constraints are not applied to null/None because validation selects the None branch of the union, bypassing the int validator and its constraints. The constraints only apply if the int branch is selected (i.e., a non-None value that validates as int within 1-1000). This behavior follows Pydantic's union validation, where Optional[T] means the field is required but accepts None or a value matching T with its constraints. Official docs confirm Field(ge=, le=) apply to numbers, and union modes like 'smart' select the matching variant without applying constraints from other branches (sources 1,2,8,10,11,13,15,16,24,27). Stack Overflow and GitHub issues illustrate Optional accepts None without constraint violations on None (sources 4,6). No sources indicate constraints apply to None; numeric validators expect numbers, and None succeeds the union without reaching them.

Citations:


top_k must not be Optional to enforce the bounded-integer contract.

With Optional[int], clients can send {"top_k": null} and the field will accept it; the ge=1, le=1000 constraints only apply when an actual integer is provided. In Pydantic v2, Optional[T] means the field accepts None or a value of type T (with its constraints), but None bypasses the numeric validators entirely. If the API requires an integer between 1 and 1000, remove the Optional:

Proposed fix
-    top_k: Optional[int] = Field(
+    top_k: int = Field(
         default=10, ge=1, le=1000, description="Maximum number of results to return."
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/api/v1/search/routers/get_search_router.py` around lines 38 - 40, The
top_k field currently declares type Optional[int], allowing null to bypass the
numeric bounds; change its annotation to int (keep default=10, ge=1, le=1000,
description unchanged) so Pydantic enforces the 1..1000 constraint; also remove
or update any unused Optional import if present and ensure the model/endpoint
that uses top_k expects an int rather than None (look for the top_k Field
definition in get_search_router.py).

Comment on lines +121 to +130
app.dependency_overrides[get_authenticated_user] = override_get_authenticated_user
files = {"data": ("test.txt", b"test content", "text/plain")}
form_data = {} # No datasetName, no datasetId

response = client.post("/api/v1/add", files=files, data=form_data)

assert response.status_code == 400
assert response.json()["detail"] == "Either datasetId or datasetName must be provided."
app.dependency_overrides.pop(get_authenticated_user, None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify override setup/cleanup placement in the changed test.
rg -n -C3 'dependency_overrides\[' cognee/tests/api/test_conditional_authentication_endpoints.py
rg -n -C3 'dependency_overrides\.pop\(' cognee/tests/api/test_conditional_authentication_endpoints.py

Repository: topoteretes/cognee

Length of output: 756


🏁 Script executed:

#!/bin/bash
# Check test class structure and setup/teardown
rg -n -B5 'class.*Test' cognee/tests/api/test_conditional_authentication_endpoints.py | head -30

# Count all dependency_overrides patterns in the file
echo "=== All dependency_overrides patterns ==="
rg -n 'dependency_overrides' cognee/tests/api/test_conditional_authentication_endpoints.py

# Check if there are pytest fixtures or setup/teardown methods
echo "=== Setup/teardown patterns ==="
rg -n '(setUp|tearDown|@pytest\.fixture|def setup|def teardown)' cognee/tests/api/test_conditional_authentication_endpoints.py

Repository: topoteretes/cognee

Length of output: 1391


🏁 Script executed:

#!/bin/bash
# Read the full test method context
sed -n '111,135p' cognee/tests/api/test_conditional_authentication_endpoints.py

# Check what the fixtures do
echo "=== Fixtures at line 47 and 161 ==="
sed -n '47,65p' cognee/tests/api/test_conditional_authentication_endpoints.py
sed -n '161,180p' cognee/tests/api/test_conditional_authentication_endpoints.py

Repository: topoteretes/cognee

Length of output: 2760


Wrap dependency override setup/cleanup in try/finally to prevent state leakage on test failure.

At line 121, the dependency override is set but cleanup at line 129 is skipped if assertions at lines 127–128 fail. This leaves the global override active for subsequent tests. Restructure to guarantee cleanup:

             app.dependency_overrides[get_authenticated_user] = override_get_authenticated_user
-            files = {"data": ("test.txt", b"test content", "text/plain")}
-            form_data = {}  # No datasetName, no datasetId
-
-            response = client.post("/api/v1/add", files=files, data=form_data)
-
-            assert response.status_code == 400
-            assert response.json()["detail"] == "Either datasetId or datasetName must be provided."
-            app.dependency_overrides.pop(get_authenticated_user, None)
+            try:
+                files = {"data": ("test.txt", b"test content", "text/plain")}
+                form_data = {}  # No datasetName, no datasetId
+                response = client.post("/api/v1/add", files=files, data=form_data)
+
+                assert response.status_code == 400
+                assert response.json()["detail"] == "Either datasetId or datasetName must be provided."
+            finally:
+                app.dependency_overrides.pop(get_authenticated_user, None)

This aligns with the coding guideline: avoid external state in tests and rely on proper cleanup mechanisms.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/tests/api/test_conditional_authentication_endpoints.py` around lines
121 - 130, The test sets app.dependency_overrides[get_authenticated_user] =
override_get_authenticated_user but removes it only after assertions, so
failures leak state; wrap the override setup and the client.post/assert block in
a try/finally and move app.dependency_overrides.pop(get_authenticated_user,
None) into the finally so the override is always removed (reference
get_authenticated_user, override_get_authenticated_user,
app.dependency_overrides.pop, and the client.post("/api/v1/add") call).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants