Skip to content

fix(skills): harden skills API review findings#1366

Merged
slin1237 merged 1 commit intomainfrom
fix/skills-api-review-findings
Apr 23, 2026
Merged

fix(skills): harden skills API review findings#1366
slin1237 merged 1 commit intomainfrom
fix/skills-api-review-findings

Conversation

@slin1237
Copy link
Copy Markdown
Member

@slin1237 slin1237 commented Apr 23, 2026

Summary

  • Enforce configured skills upload, file-size, total-size, and file-count limits before buffering uploads and inside direct service calls.
  • Sanitize internal skills service, store, and blob errors in API responses while logging backend details server-side.
  • Gate skills CRUD routes with allowed_operations and make create/version metadata writes atomic at the store boundary.

Validation

  • cargo check -p smg-skills -p smg --tests
  • cargo test -p smg-skills
  • cargo test -p smg --test api_tests skills_api
  • cargo test -p smg internal_skill_service_errors_hide_backend_details
  • cargo +nightly fmt --all --check
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test

Summary by CodeRabbit

Release Notes

  • New Features

    • Configurable upload limits for skills now enforced: maximum file size, file count per version, and total upload size.
    • Admin operation authorization checks added to skill management endpoints; requests without proper permissions return 403 Forbidden.
  • Bug Fixes

    • Enhanced error handling with clearer, safer error messages for skill service operations.
    • Added validation to prevent configuration overflow when converting size limits.

Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

Introduces configurable SkillUploadLimits to enforce file and aggregate size constraints on skill uploads via zip and multipart routes. Adds atomic combined write operations to the metadata store trait to ensure skills and versions persist together. Implements admin operation authorization gates across CRUD handlers and refactors multipart parsing with chunked, size-limited readers.

Changes

Cohort / File(s) Summary
Configuration & Type Definitions
crates/skills/src/config.rs, crates/skills/src/lib.rs
Adds SkillUploadLimits struct with MB-to-bytes conversion and validation; extends SkillsAdminOperation enum with CreateAnyTenant and UpdateAnyTenant variants; exports new type via lib.rs public re-export.
Skill Service API & Validation
crates/skills/src/api.rs, crates/skills/src/validation.rs
Introduces constructors single_process_with_limits and in_memory_with_limits; enforces upload limits during normalization via normalize_skill_bundle_zip_with_limits; refactors create/upgrade persistence to use atomic put_skill_with_initial_version and put_skill_version_and_update_skill operations.
Metadata Store & In-Memory Implementation
crates/skills/src/storage.rs, crates/skills/src/memory.rs
Adds two trait methods for atomic skill+version writes; centralizes validation logic in *_locked helpers; implements new methods with tenant consistency enforcement and skill-id validation.
Gateway Wiring & Authorization
model_gateway/src/app_context.rs, model_gateway/src/config/validation.rs, model_gateway/src/routers/skills/handlers.rs
Parses upload limits from config and passes to service constructors; adds mebibyte-overflow validation; gates all CRUD/list/get handlers behind admin operation checks; refactors multipart parsing to enforce per-file, aggregate, and field-size limits; adds internal error logging without backend detail leakage.
Integration Tests
model_gateway/tests/api/skills_api_test.rs
Verifies 400 rejection when multipart file exceeds max_file_size_mb; verifies 403 rejection when admin operation is not allowed in configured permissions.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler as Handler<br/>(Authorization)
    participant Parser as Multipart<br/>Parser
    participant Validator as Normalization<br/>& Limits
    participant Store as Metadata<br/>Store
    
    Client->>Handler: POST /v1/skills (multipart)
    Handler->>Handler: ensure_admin_operation_allowed<br/>CreateAnyTenant?
    alt Operation Not Allowed
        Handler-->>Client: 403 Forbidden
    end
    
    Handler->>Parser: parse_skill_upload<br/>with limits
    Parser->>Parser: Chunked read<br/>enforce max_files,<br/>file_size, tenant_id bytes
    alt Upload Exceeds Limits
        Parser-->>Client: 400 Bad Request
    end
    
    Parser->>Validator: normalize_skill_bundle_zip_with_limits
    Validator->>Validator: Validate zip entries<br/>vs max_upload_size_bytes,<br/>max_file_size_bytes
    alt Validation Fails
        Validator-->>Client: 400 Bad Request
    end
    
    Validator->>Store: put_skill_with_initial_version<br/>(skill, version)
    Store->>Store: Atomic:<br/>validate tenant,<br/>insert skill & version
    alt Store Fails
        Store-->>Client: 500 Internal Error
    end
    
    Store-->>Handler: Success
    Handler-->>Client: 201 Created
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • PR #1332: Modifies multipart parsing and bundle normalization in the upload pipeline with additional limit enforcement on the same code paths.
  • PR #1249: Provides foundational scaffolding for SkillService, SkillsConfig, and related types that this PR extends with limits and new operations.
  • PR #1255: Introduces the zip normalization API that this PR adapts with limits-aware overload and threshold parameterization.

Suggested reviewers

  • CatherineSue
  • key4ng
  • claude

Poem

🐰 Hop, hop! With limits set so tight,
Files fly through checks, day and night,
Atomic writes keep skills so true,
Admin gates guard what we do—
No more sprawling uploads wild,
Just bounded bundles, safely filed! 🎯

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'fix(skills): harden skills API review findings' is vague and generic, using the broad term 'harden' and 'review findings' without specifying which aspects are being hardened or what the main changes are. Consider a more specific title that highlights the primary changes, such as 'fix(skills): enforce upload limits and secure API access with auth gates' or similar.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/skills-api-review-findings

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added tests Test changes model-gateway Model gateway crate changes labels Apr 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces configurable upload limits for skill bundles, including constraints on total size, file count, and individual file size. It refactors the SkillService and SkillMetadataStore to enforce these limits during multipart parsing and bundle normalization. Additionally, the PR adds new administrative operations for tenant management and updates the in-memory store to support atomic-like metadata updates. Feedback was provided regarding a potential race condition in the in-memory store that could lead to version collisions.

Comment on lines +153 to +155
validate_skill_tenant_locked(&state, &skill)?;
put_skill_version_locked(&mut state, version)?;
put_skill_locked(&mut state, skill)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In put_skill_version_and_update_skill, there is a potential race condition where a version ID generated based on a timestamp could collide under high concurrency. If using UUID v7, ensure the full UUID or a sufficient prefix (e.g., simple()[..24]) is used to ensure uniqueness. The store should enforce uniqueness by checking if the version key already exists before inserting it, returning a conflict error instead of silently overwriting the existing version data.

Suggested change
validate_skill_tenant_locked(&state, &skill)?;
put_skill_version_locked(&mut state, version)?;
put_skill_locked(&mut state, skill)?;
let mut state = self.state.write();
validate_skill_tenant_locked(&state, &skill)?;
let key = (version.skill_id.clone(), version.version.clone());
if state.skill_versions.contains_key(&key) {
return Err(SkillsStoreError::InvalidData(format!(
"version '{}' already exists for skill_id '{}'",
version.version, version.skill_id
)));
}
put_skill_version_locked(&mut state, version)?;
put_skill_locked(&mut state, skill)?;
References
  1. When generating unique IDs from UUID v7, avoid using only the first segment as it is not unique within short time intervals. Use the full UUID or a long enough prefix to ensure uniqueness.
  2. Avoid silently ignoring potential failures or overwriting data; prefer explicit error handling or logging to prevent silent failures.

Comment on lines +988 to +993
let remaining_upload_bytes = limits
.max_upload_size_bytes
.saturating_sub(total_file_bytes);
let max_field_bytes = remaining_upload_bytes.min(limits.max_file_size_bytes);
let contents =
read_field_bytes_limited(field, max_field_bytes, "uploaded skill file").await?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: When previous files have consumed the full budget, saturating_sub returns 0, so max_field_bytes = 0. The next non-empty file then triggers read_field_bytes_limitedupload_too_large with the message "uploaded skill file exceeds maximum size of 0 bytes", which is confusing.

An early return when the budget is exhausted would produce a clearer error:

Suggested change
let remaining_upload_bytes = limits
.max_upload_size_bytes
.saturating_sub(total_file_bytes);
let max_field_bytes = remaining_upload_bytes.min(limits.max_file_size_bytes);
let contents =
read_field_bytes_limited(field, max_field_bytes, "uploaded skill file").await?;
let remaining_upload_bytes = limits
.max_upload_size_bytes
.saturating_sub(total_file_bytes);
if remaining_upload_bytes == 0 {
return Err(SkillsApiError::BadRequest {
code: "skill_upload_too_large",
message: format!(
"skill upload exceeds maximum total size of {} bytes",
limits.max_upload_size_bytes
),
});
}
let max_field_bytes = remaining_upload_bytes.min(limits.max_file_size_bytes);
let contents =
read_field_bytes_limited(field, max_field_bytes, "uploaded skill file").await?;

@slin1237 slin1237 marked this pull request as ready for review April 23, 2026 21:14
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
model_gateway/src/routers/skills/handlers.rs (1)

292-307: ⚠️ Potential issue | 🟠 Major

Use update permission for version creation.

POST /v1/skills/{skill_id}/versions mutates an existing skill by adding a version and advancing its projection. Gating it with CreateAnyTenant lets callers who are denied updates still modify existing skills.

Suggested fix
-    ensure_admin_operation_allowed(&state, SkillsAdminOperation::CreateAnyTenant)?;
+    ensure_admin_operation_allowed(&state, SkillsAdminOperation::UpdateAnyTenant)?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/skills/handlers.rs` around lines 292 - 307,
create_skill_version_impl currently gates version creation with
ensure_admin_operation_allowed(..., SkillsAdminOperation::CreateAnyTenant), but
adding a version is an update to an existing skill; change the permission check
to use the update permission instead (replace
SkillsAdminOperation::CreateAnyTenant with SkillsAdminOperation::UpdateAnyTenant
in the ensure_admin_operation_allowed call inside create_skill_version_impl) so
callers need update rights to add versions.
crates/skills/src/api.rs (1)

853-910: ⚠️ Potential issue | 🟠 Major

Reserve ZIP framing overhead before buffering files[] uploads.

Line 888 only caps the sum of file.contents.len(), but Lines 857-874 then build a stored ZIP archive, which adds per-entry and central-directory overhead after validation. A bundle that is exactly at max_upload_size_bytes in raw file bytes can still allocate a larger archive buffer here, so the pre-buffering upload cap is still bypassable for multipart uploads. Reject based on an archive-size estimate before writing, or stop as soon as the buffered ZIP exceeds max_upload_size_bytes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/skills/src/api.rs` around lines 853 - 910, The current validation only
sums raw file bytes but normalize_files_upload builds a ZIP (in ZipWriter) which
can exceed max_upload_size_bytes; update the logic to enforce archive-size
limits: either augment validate_files_upload_limits to conservatively estimate
ZIP overhead per entry (local header + central dir + filename length) and fail
when the estimated archive size would exceed limits.max_upload_size_bytes, or
(simpler) instrument normalize_files_upload to check the buffer length after
each start_file/write/finish and immediately return
SkillBundleArchiveError::BundleTooLarge.into() if buffer.get_ref().len() >
limits.max_upload_size_bytes; apply the same SkillBundleArchiveError variant so
callers of normalize_skill_bundle_zip_with_limits see a consistent error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/src/routers/skills/handlers.rs`:
- Around line 692-709: The helper ensure_admin_operation_allowed currently only
checks admin.allowed_operations and ignores the admin.enabled flag, so update it
to also require that state.context.router_config.skills.as_ref().and_then(|s|
s.admin.as_ref()).map_or(false, |admin| admin.enabled) (or equivalent) is true
before allowing the operation; if admin.enabled is false, return the same
SkillsApiError::Forbidden (code "skills_operation_not_allowed") to ensure the
global admin-off switch blocks handler access even when allowed_operations is
populated (see ensure_admin_operation_allowed, SkillsAdminConfig::default, and
the admin.allowed_operations check).

---

Outside diff comments:
In `@crates/skills/src/api.rs`:
- Around line 853-910: The current validation only sums raw file bytes but
normalize_files_upload builds a ZIP (in ZipWriter) which can exceed
max_upload_size_bytes; update the logic to enforce archive-size limits: either
augment validate_files_upload_limits to conservatively estimate ZIP overhead per
entry (local header + central dir + filename length) and fail when the estimated
archive size would exceed limits.max_upload_size_bytes, or (simpler) instrument
normalize_files_upload to check the buffer length after each
start_file/write/finish and immediately return
SkillBundleArchiveError::BundleTooLarge.into() if buffer.get_ref().len() >
limits.max_upload_size_bytes; apply the same SkillBundleArchiveError variant so
callers of normalize_skill_bundle_zip_with_limits see a consistent error.

In `@model_gateway/src/routers/skills/handlers.rs`:
- Around line 292-307: create_skill_version_impl currently gates version
creation with ensure_admin_operation_allowed(...,
SkillsAdminOperation::CreateAnyTenant), but adding a version is an update to an
existing skill; change the permission check to use the update permission instead
(replace SkillsAdminOperation::CreateAnyTenant with
SkillsAdminOperation::UpdateAnyTenant in the ensure_admin_operation_allowed call
inside create_skill_version_impl) so callers need update rights to add versions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 929e29c5-bd40-4c73-b227-77a9e51c3b1d

📥 Commits

Reviewing files that changed from the base of the PR and between 0b969a4 and 45acc08.

📒 Files selected for processing (10)
  • crates/skills/src/api.rs
  • crates/skills/src/config.rs
  • crates/skills/src/lib.rs
  • crates/skills/src/memory.rs
  • crates/skills/src/storage.rs
  • crates/skills/src/validation.rs
  • model_gateway/src/app_context.rs
  • model_gateway/src/config/validation.rs
  • model_gateway/src/routers/skills/handlers.rs
  • model_gateway/tests/api/skills_api_test.rs

Comment on lines +692 to +709
fn ensure_admin_operation_allowed(
state: &AppState,
operation: SkillsAdminOperation,
) -> Result<(), SkillsApiError> {
let allowed = state
.context
.router_config
.skills
.as_ref()
.is_some_and(|skills| skills.admin.allowed_operations.contains(&operation));
if allowed {
return Ok(());
}

Err(SkillsApiError::Forbidden {
code: "skills_operation_not_allowed",
message: format!("skills admin operation '{operation:?}' is not allowed"),
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve admin.enabled in the handler-side authorization gate.

This helper only checks allowed_operations. Since SkillsAdminConfig::default() still populates every operation while enabled is false, any accidentally mounted handler bypasses the global admin-off switch. Fold skills.admin.enabled into this check as defense in depth.

Suggested fix
 fn ensure_admin_operation_allowed(
     state: &AppState,
     operation: SkillsAdminOperation,
 ) -> Result<(), SkillsApiError> {
     let allowed = state
         .context
         .router_config
         .skills
         .as_ref()
-        .is_some_and(|skills| skills.admin.allowed_operations.contains(&operation));
+        .is_some_and(|skills| {
+            skills.admin.enabled && skills.admin.allowed_operations.contains(&operation)
+        });
     if allowed {
         return Ok(());
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn ensure_admin_operation_allowed(
state: &AppState,
operation: SkillsAdminOperation,
) -> Result<(), SkillsApiError> {
let allowed = state
.context
.router_config
.skills
.as_ref()
.is_some_and(|skills| skills.admin.allowed_operations.contains(&operation));
if allowed {
return Ok(());
}
Err(SkillsApiError::Forbidden {
code: "skills_operation_not_allowed",
message: format!("skills admin operation '{operation:?}' is not allowed"),
})
fn ensure_admin_operation_allowed(
state: &AppState,
operation: SkillsAdminOperation,
) -> Result<(), SkillsApiError> {
let allowed = state
.context
.router_config
.skills
.as_ref()
.is_some_and(|skills| {
skills.admin.enabled && skills.admin.allowed_operations.contains(&operation)
});
if allowed {
return Ok(());
}
Err(SkillsApiError::Forbidden {
code: "skills_operation_not_allowed",
message: format!("skills admin operation '{operation:?}' is not allowed"),
})
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/skills/handlers.rs` around lines 692 - 709, The
helper ensure_admin_operation_allowed currently only checks
admin.allowed_operations and ignores the admin.enabled flag, so update it to
also require that state.context.router_config.skills.as_ref().and_then(|s|
s.admin.as_ref()).map_or(false, |admin| admin.enabled) (or equivalent) is true
before allowing the operation; if admin.enabled is false, return the same
SkillsApiError::Forbidden (code "skills_operation_not_allowed") to ensure the
global admin-off switch blocks handler access even when allowed_operations is
populated (see ensure_admin_operation_allowed, SkillsAdminConfig::default, and
the admin.allowed_operations check).

@slin1237 slin1237 merged commit 26c8495 into main Apr 23, 2026
52 checks passed
@slin1237 slin1237 deleted the fix/skills-api-review-findings branch April 23, 2026 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model-gateway Model gateway crate changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant