fix(skills): harden skills API review findings#1366
Conversation
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
📝 WalkthroughWalkthroughIntroduces configurable Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Handler as Handler<br/>(Authorization)
participant Parser as Multipart<br/>Parser
participant Validator as Normalization<br/>& Limits
participant Store as Metadata<br/>Store
Client->>Handler: POST /v1/skills (multipart)
Handler->>Handler: ensure_admin_operation_allowed<br/>CreateAnyTenant?
alt Operation Not Allowed
Handler-->>Client: 403 Forbidden
end
Handler->>Parser: parse_skill_upload<br/>with limits
Parser->>Parser: Chunked read<br/>enforce max_files,<br/>file_size, tenant_id bytes
alt Upload Exceeds Limits
Parser-->>Client: 400 Bad Request
end
Parser->>Validator: normalize_skill_bundle_zip_with_limits
Validator->>Validator: Validate zip entries<br/>vs max_upload_size_bytes,<br/>max_file_size_bytes
alt Validation Fails
Validator-->>Client: 400 Bad Request
end
Validator->>Store: put_skill_with_initial_version<br/>(skill, version)
Store->>Store: Atomic:<br/>validate tenant,<br/>insert skill & version
alt Store Fails
Store-->>Client: 500 Internal Error
end
Store-->>Handler: Success
Handler-->>Client: 201 Created
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces configurable upload limits for skill bundles, including constraints on total size, file count, and individual file size. It refactors the SkillService and SkillMetadataStore to enforce these limits during multipart parsing and bundle normalization. Additionally, the PR adds new administrative operations for tenant management and updates the in-memory store to support atomic-like metadata updates. Feedback was provided regarding a potential race condition in the in-memory store that could lead to version collisions.
| validate_skill_tenant_locked(&state, &skill)?; | ||
| put_skill_version_locked(&mut state, version)?; | ||
| put_skill_locked(&mut state, skill)?; |
There was a problem hiding this comment.
In put_skill_version_and_update_skill, there is a potential race condition where a version ID generated based on a timestamp could collide under high concurrency. If using UUID v7, ensure the full UUID or a sufficient prefix (e.g., simple()[..24]) is used to ensure uniqueness. The store should enforce uniqueness by checking if the version key already exists before inserting it, returning a conflict error instead of silently overwriting the existing version data.
| validate_skill_tenant_locked(&state, &skill)?; | |
| put_skill_version_locked(&mut state, version)?; | |
| put_skill_locked(&mut state, skill)?; | |
| let mut state = self.state.write(); | |
| validate_skill_tenant_locked(&state, &skill)?; | |
| let key = (version.skill_id.clone(), version.version.clone()); | |
| if state.skill_versions.contains_key(&key) { | |
| return Err(SkillsStoreError::InvalidData(format!( | |
| "version '{}' already exists for skill_id '{}'", | |
| version.version, version.skill_id | |
| ))); | |
| } | |
| put_skill_version_locked(&mut state, version)?; | |
| put_skill_locked(&mut state, skill)?; |
References
- When generating unique IDs from UUID v7, avoid using only the first segment as it is not unique within short time intervals. Use the full UUID or a long enough prefix to ensure uniqueness.
- Avoid silently ignoring potential failures or overwriting data; prefer explicit error handling or logging to prevent silent failures.
| let remaining_upload_bytes = limits | ||
| .max_upload_size_bytes | ||
| .saturating_sub(total_file_bytes); | ||
| let max_field_bytes = remaining_upload_bytes.min(limits.max_file_size_bytes); | ||
| let contents = | ||
| read_field_bytes_limited(field, max_field_bytes, "uploaded skill file").await?; |
There was a problem hiding this comment.
🟡 Nit: When previous files have consumed the full budget, saturating_sub returns 0, so max_field_bytes = 0. The next non-empty file then triggers read_field_bytes_limited → upload_too_large with the message "uploaded skill file exceeds maximum size of 0 bytes", which is confusing.
An early return when the budget is exhausted would produce a clearer error:
| let remaining_upload_bytes = limits | |
| .max_upload_size_bytes | |
| .saturating_sub(total_file_bytes); | |
| let max_field_bytes = remaining_upload_bytes.min(limits.max_file_size_bytes); | |
| let contents = | |
| read_field_bytes_limited(field, max_field_bytes, "uploaded skill file").await?; | |
| let remaining_upload_bytes = limits | |
| .max_upload_size_bytes | |
| .saturating_sub(total_file_bytes); | |
| if remaining_upload_bytes == 0 { | |
| return Err(SkillsApiError::BadRequest { | |
| code: "skill_upload_too_large", | |
| message: format!( | |
| "skill upload exceeds maximum total size of {} bytes", | |
| limits.max_upload_size_bytes | |
| ), | |
| }); | |
| } | |
| let max_field_bytes = remaining_upload_bytes.min(limits.max_file_size_bytes); | |
| let contents = | |
| read_field_bytes_limited(field, max_field_bytes, "uploaded skill file").await?; |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
model_gateway/src/routers/skills/handlers.rs (1)
292-307:⚠️ Potential issue | 🟠 MajorUse update permission for version creation.
POST /v1/skills/{skill_id}/versionsmutates an existing skill by adding a version and advancing its projection. Gating it withCreateAnyTenantlets callers who are denied updates still modify existing skills.Suggested fix
- ensure_admin_operation_allowed(&state, SkillsAdminOperation::CreateAnyTenant)?; + ensure_admin_operation_allowed(&state, SkillsAdminOperation::UpdateAnyTenant)?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/skills/handlers.rs` around lines 292 - 307, create_skill_version_impl currently gates version creation with ensure_admin_operation_allowed(..., SkillsAdminOperation::CreateAnyTenant), but adding a version is an update to an existing skill; change the permission check to use the update permission instead (replace SkillsAdminOperation::CreateAnyTenant with SkillsAdminOperation::UpdateAnyTenant in the ensure_admin_operation_allowed call inside create_skill_version_impl) so callers need update rights to add versions.crates/skills/src/api.rs (1)
853-910:⚠️ Potential issue | 🟠 MajorReserve ZIP framing overhead before buffering
files[]uploads.Line 888 only caps the sum of
file.contents.len(), but Lines 857-874 then build a stored ZIP archive, which adds per-entry and central-directory overhead after validation. A bundle that is exactly atmax_upload_size_bytesin raw file bytes can still allocate a larger archive buffer here, so the pre-buffering upload cap is still bypassable for multipart uploads. Reject based on an archive-size estimate before writing, or stop as soon as the buffered ZIP exceedsmax_upload_size_bytes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/skills/src/api.rs` around lines 853 - 910, The current validation only sums raw file bytes but normalize_files_upload builds a ZIP (in ZipWriter) which can exceed max_upload_size_bytes; update the logic to enforce archive-size limits: either augment validate_files_upload_limits to conservatively estimate ZIP overhead per entry (local header + central dir + filename length) and fail when the estimated archive size would exceed limits.max_upload_size_bytes, or (simpler) instrument normalize_files_upload to check the buffer length after each start_file/write/finish and immediately return SkillBundleArchiveError::BundleTooLarge.into() if buffer.get_ref().len() > limits.max_upload_size_bytes; apply the same SkillBundleArchiveError variant so callers of normalize_skill_bundle_zip_with_limits see a consistent error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@model_gateway/src/routers/skills/handlers.rs`:
- Around line 692-709: The helper ensure_admin_operation_allowed currently only
checks admin.allowed_operations and ignores the admin.enabled flag, so update it
to also require that state.context.router_config.skills.as_ref().and_then(|s|
s.admin.as_ref()).map_or(false, |admin| admin.enabled) (or equivalent) is true
before allowing the operation; if admin.enabled is false, return the same
SkillsApiError::Forbidden (code "skills_operation_not_allowed") to ensure the
global admin-off switch blocks handler access even when allowed_operations is
populated (see ensure_admin_operation_allowed, SkillsAdminConfig::default, and
the admin.allowed_operations check).
---
Outside diff comments:
In `@crates/skills/src/api.rs`:
- Around line 853-910: The current validation only sums raw file bytes but
normalize_files_upload builds a ZIP (in ZipWriter) which can exceed
max_upload_size_bytes; update the logic to enforce archive-size limits: either
augment validate_files_upload_limits to conservatively estimate ZIP overhead per
entry (local header + central dir + filename length) and fail when the estimated
archive size would exceed limits.max_upload_size_bytes, or (simpler) instrument
normalize_files_upload to check the buffer length after each
start_file/write/finish and immediately return
SkillBundleArchiveError::BundleTooLarge.into() if buffer.get_ref().len() >
limits.max_upload_size_bytes; apply the same SkillBundleArchiveError variant so
callers of normalize_skill_bundle_zip_with_limits see a consistent error.
In `@model_gateway/src/routers/skills/handlers.rs`:
- Around line 292-307: create_skill_version_impl currently gates version
creation with ensure_admin_operation_allowed(...,
SkillsAdminOperation::CreateAnyTenant), but adding a version is an update to an
existing skill; change the permission check to use the update permission instead
(replace SkillsAdminOperation::CreateAnyTenant with
SkillsAdminOperation::UpdateAnyTenant in the ensure_admin_operation_allowed call
inside create_skill_version_impl) so callers need update rights to add versions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 929e29c5-bd40-4c73-b227-77a9e51c3b1d
📒 Files selected for processing (10)
crates/skills/src/api.rscrates/skills/src/config.rscrates/skills/src/lib.rscrates/skills/src/memory.rscrates/skills/src/storage.rscrates/skills/src/validation.rsmodel_gateway/src/app_context.rsmodel_gateway/src/config/validation.rsmodel_gateway/src/routers/skills/handlers.rsmodel_gateway/tests/api/skills_api_test.rs
| fn ensure_admin_operation_allowed( | ||
| state: &AppState, | ||
| operation: SkillsAdminOperation, | ||
| ) -> Result<(), SkillsApiError> { | ||
| let allowed = state | ||
| .context | ||
| .router_config | ||
| .skills | ||
| .as_ref() | ||
| .is_some_and(|skills| skills.admin.allowed_operations.contains(&operation)); | ||
| if allowed { | ||
| return Ok(()); | ||
| } | ||
|
|
||
| Err(SkillsApiError::Forbidden { | ||
| code: "skills_operation_not_allowed", | ||
| message: format!("skills admin operation '{operation:?}' is not allowed"), | ||
| }) |
There was a problem hiding this comment.
Preserve admin.enabled in the handler-side authorization gate.
This helper only checks allowed_operations. Since SkillsAdminConfig::default() still populates every operation while enabled is false, any accidentally mounted handler bypasses the global admin-off switch. Fold skills.admin.enabled into this check as defense in depth.
Suggested fix
fn ensure_admin_operation_allowed(
state: &AppState,
operation: SkillsAdminOperation,
) -> Result<(), SkillsApiError> {
let allowed = state
.context
.router_config
.skills
.as_ref()
- .is_some_and(|skills| skills.admin.allowed_operations.contains(&operation));
+ .is_some_and(|skills| {
+ skills.admin.enabled && skills.admin.allowed_operations.contains(&operation)
+ });
if allowed {
return Ok(());
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn ensure_admin_operation_allowed( | |
| state: &AppState, | |
| operation: SkillsAdminOperation, | |
| ) -> Result<(), SkillsApiError> { | |
| let allowed = state | |
| .context | |
| .router_config | |
| .skills | |
| .as_ref() | |
| .is_some_and(|skills| skills.admin.allowed_operations.contains(&operation)); | |
| if allowed { | |
| return Ok(()); | |
| } | |
| Err(SkillsApiError::Forbidden { | |
| code: "skills_operation_not_allowed", | |
| message: format!("skills admin operation '{operation:?}' is not allowed"), | |
| }) | |
| fn ensure_admin_operation_allowed( | |
| state: &AppState, | |
| operation: SkillsAdminOperation, | |
| ) -> Result<(), SkillsApiError> { | |
| let allowed = state | |
| .context | |
| .router_config | |
| .skills | |
| .as_ref() | |
| .is_some_and(|skills| { | |
| skills.admin.enabled && skills.admin.allowed_operations.contains(&operation) | |
| }); | |
| if allowed { | |
| return Ok(()); | |
| } | |
| Err(SkillsApiError::Forbidden { | |
| code: "skills_operation_not_allowed", | |
| message: format!("skills admin operation '{operation:?}' is not allowed"), | |
| }) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@model_gateway/src/routers/skills/handlers.rs` around lines 692 - 709, The
helper ensure_admin_operation_allowed currently only checks
admin.allowed_operations and ignores the admin.enabled flag, so update it to
also require that state.context.router_config.skills.as_ref().and_then(|s|
s.admin.as_ref()).map_or(false, |admin| admin.enabled) (or equivalent) is true
before allowing the operation; if admin.enabled is false, return the same
SkillsApiError::Forbidden (code "skills_operation_not_allowed") to ensure the
global admin-off switch blocks handler access even when allowed_operations is
populated (see ensure_admin_operation_allowed, SkillsAdminConfig::default, and
the admin.allowed_operations check).
Summary
allowed_operationsand make create/version metadata writes atomic at the store boundary.Validation
cargo check -p smg-skills -p smg --testscargo test -p smg-skillscargo test -p smg --test api_tests skills_apicargo test -p smg internal_skill_service_errors_hide_backend_detailscargo +nightly fmt --all --checkcargo clippy --all-targets --all-features -- -D warningscargo testSummary by CodeRabbit
Release Notes
New Features
Bug Fixes