Skip to content

fix(perms): propagate sender/role through MCP bridge & delegate (#915)#1105

Open
mozaa-solana wants to merge 1 commit into
nextlevelbuilder:devfrom
mozaa-solana:fix/perms-bridge-delegate-915
Open

fix(perms): propagate sender/role through MCP bridge & delegate (#915)#1105
mozaa-solana wants to merge 1 commit into
nextlevelbuilder:devfrom
mozaa-solana:fix/perms-bridge-delegate-915

Conversation

@mozaa-solana

Copy link
Copy Markdown
Contributor

Summary

Group-scoped permission checks (CheckCronPermission / CheckFileWriterPermission) deny with senderID="" when tools are invoked through:

  1. MCP bridge (external CLI providers like claude-cli, opencode-go) — the bridge middleware never injected SenderID/Role into ctx, so even though the agent loop sets them on RunRequest, the bridge layer dropped attribution. Result: any group cron / file_writer mutation called via the bridge denies as "system context".
  2. delegate tool (gateway_managed.go delegateRunFn) — built RunRequest without copying SenderID/Role/Channel/ChatID/PeerKind from DelegateRequest, so the target agent's loop_context.injectContext skipped WithSenderID and downstream group cron/file_writer mutations denied with senderID="".

Changes

  • internal/providers/claude_cli.go — new OptSenderID, OptRole.
  • internal/providers/claude_cli_mcp.goBridgeContext carries SenderID+Role; writeMCPConfig emits X-Sender-ID/X-Role headers and signs them in the HMAC payload (extras: localKey | sessionKey | senderID | role). VerifyBridgeContext gains a backward-compat fallback that drops the last 2 extras for sessions written pre-this-change.
  • internal/providers/claude_cli_session.gobridgeContextFromOpts reads new opts.
  • internal/gateway/server.go (bridgeContextMiddleware) — read X-Sender-ID+X-Role headers (HMAC-protected), inject WithSenderID/WithRole into ctx.
  • internal/agent/loop_pipeline_callbacks.go — pass req.SenderID/req.Role into chatReq.Options.
  • cmd/gateway_managed.go (delegateRunFn) — propagate SenderID/Role/Channel/ChatID/PeerKind from DelegateRequest into RunRequest.
  • internal/tools/team_tool_dispatch.go + cmd/gateway_consumer_normal.goslog.Warn when sender is empty in a group-target dispatch so admins can find upstream callers losing attribution.
  • internal/store/config_permission_store.go — surface diagnostic-rich error messages (which user/scope/configType denied) instead of generic "permission denied".
  • internal/tools/cron.go — forward the diagnostic message to LLM result so the agent can self-correct.

Test plan

  • go build ./... clean
  • go test ./internal/providers/... ./internal/store/... ./internal/gateway/... ./internal/agent/... passes (incl. TestVerifyBridgeContext HMAC fallbacks)
  • Reproduced the cron-in-Telegram-group denial, applied this fix, retested, cron now works for whitelisted users
  • Backward compatibility verified: pre-existing MCP config sessions (without senderID/role headers) still match the older HMAC fallback in VerifyBridgeContext.

Related

Follow-up to #915. Diagnostic logs added in this PR are the ones I used to localize the root cause; happy to drop them if reviewers prefer.

…levelbuilder#915)

Group-scoped permission checks (CheckCronPermission / CheckFileWriterPermission)
fail with senderID="" when tools are invoked through:

  1. MCP bridge (external CLI providers like claude-cli) — middleware never
     injected SenderID/Role into ctx, so even though the agent loop set them
     on RunRequest, the bridge layer dropped attribution.
  2. delegate tool (gateway_managed.go delegateRunFn) — built RunRequest
     without copying SenderID/Role/Channel/ChatID/PeerKind from
     DelegateRequest, so the target agent's loop_context.injectContext
     skipped WithSenderID and downstream group cron/file_writer mutations
     denied with senderID="".

Fix:

  - claude_cli: new OptSenderID, OptRole; BridgeContext carries SenderID +
    Role; writeMCPConfig emits X-Sender-ID / X-Role headers and signs them
    in the HMAC payload (extras: localKey | sessionKey | senderID | role).
    VerifyBridgeContext gains a backward-compat fallback that drops the
    last 2 extras for sessions written pre-this-change.
  - gateway/server.bridgeContextMiddleware: read X-Sender-ID + X-Role
    headers (HMAC-protected), inject WithSenderID / WithRole into ctx.
  - agent loop_pipeline_callbacks: pass req.SenderID and req.Role into
    chatReq.Options so external CLI providers receive them.
  - gateway_managed.delegateRunFn: propagate SenderID/Role/Channel/ChatID/
    PeerKind from DelegateRequest into RunRequest.
  - team_tool_dispatch / gateway_consumer_normal: warn when sender is
    empty in a group-target dispatch so admins can find the upstream
    caller losing attribution.
  - config_permission_store: surface diagnostic-rich error messages so
    LLMs/users know exactly which user/scope/configType denied, instead
    of a generic "permission denied".
  - cron tool: forward CheckCronPermission's diagnostic message to the
    LLM result so the agent can self-correct.
@clark-cant

Copy link
Copy Markdown

PR Review

Verdict: REQUEST CHANGES

Summary

This PR fixes a real attribution bug: group-scoped permission checks lose the acting sender/role when tool calls go through the MCP bridge or delegate path, causing cron/file-writer mutations to deny with senderID empty.

The intent is good, but the MCP bridge HMAC fallback currently allows unsigned X-Sender-ID / X-Role headers to be trusted for old-session signatures. That is a security issue because these headers power group-scoped permission checks.

Risk level: high, because this touches permission context propagation across the bridge.

Findings

  1. [HIGH] internal/gateway/server.go:246-282 + internal/providers/claude_cli_mcp.go:302-309 — fallback verification can trust unsigned sender/role headers.

bridgeContextMiddleware always passes localKey, sessionKey, senderID, roleHdr into VerifyBridgeContext, then injects senderID and roleHdr whenever ok == true.

But VerifyBridgeContext has a backward-compat fallback that drops the last 2 extras:

preSender := SignBridgeContext(..., extra[:len(extra)-2]...)
if hmac.Equal(...) { return true, true }

That means a request with a valid pre-sender/role signature over only localKey|sessionKey can still include arbitrary X-Sender-ID / X-Role headers and pass verification via fallback. The middleware then injects those unsigned values into context.

Impact: any bridge caller using an older MCP config signature could forge the sender or role headers for group-scoped permission checks. Since this PR specifically uses sender/role for cron/file-writer permission, these fields must not be trusted unless they were covered by the matched HMAC format.

Suggested fix: have VerifyBridgeContext return an additional signal for which context fields were covered, e.g. senderRoleVerified bool, or avoid injecting sender/role unless the full signature including senderID/role matched.

Expected behavior:

  • Current full match over localKey|sessionKey|senderID|role: ok=true, tenantVerified=true, senderRoleVerified=true
  • Pre-sender fallback over localKey|sessionKey: ok=true, tenantVerified=true, senderRoleVerified=false
  • Middleware only calls WithSenderID / WithRole when senderRoleVerified == true

Please add a regression test: sign with old extras only (localKey, sessionKey), send non-empty senderID/role, assert bridge middleware does not inject them, or assert VerifyBridgeContext reports they are not verified.

  1. [MEDIUM] internal/providers/claude_cli_mcp_test.go — missing tests for new sender/role HMAC behavior.

Existing tests cover localKey/sessionKey extra params, but this PR adds security-sensitive fields and a fallback mode. Please add explicit tests for:

  • current signature including senderID/role validates
  • tampering senderID or role fails full verification
  • old pre-sender signature remains accepted for backward compatibility but does not authorize sender/role injection

Anti-AI-Slop Check

  • Intent clear: yes — concrete group permission attribution bug and reproduction.
  • Diff scope justified: mostly yes, but permission context propagation needs sharper verification semantics.
  • Refactor justified: not applicable.
  • Author explanation needed: no.

Tests / CI

  • Local targeted check: go test ./internal/providers ./internal/gateway ./internal/store ./internal/tools ✅
  • PR status checks: none reported at review time.
  • Test gap: no coverage proving sender/role cannot be spoofed through fallback signatures.

Final Notes

The delegate propagation side looks reasonable. The bridge side needs one more security boundary: do not trust sender/role unless the matched HMAC actually covered sender/role. Fix that and add regression tests, then this should be in good shape.

@mrgoonie mrgoonie left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintainer recheck: still requesting changes. The bridge HMAC fallback issue from the prior review is still present at head b20d051e.

Blocking finding:

  • internal/gateway/server.go always passes senderID and roleHdr into VerifyBridgeContext, then injects them into context whenever ok == true.
  • internal/providers/claude_cli_mcp.go accepts a pre-sender/role fallback by dropping the last two extras and still returns (true, true).
  • Result: a request signed with the older localKey|sessionKey format can add arbitrary X-Sender-ID / X-Role headers and the middleware will trust them, even though those fields were not covered by the matched HMAC.

Why this matters: sender/role are exactly the fields used by group-scoped cron/file-writer permission checks. Backward compatibility for old MCP configs is fine, but old signatures must not authorize newly added identity headers.

Required fix:

  1. Change verification semantics so callers can distinguish context signature valid from sender/role covered by signature — e.g. return ok, tenantVerified, senderRoleVerified.
  2. Only call store.WithSenderID / store.WithRole when senderRoleVerified == true.
  3. Add regression tests for:
    • current signature with sender/role validates and injects them;
    • tampering sender/role fails full verification;
    • old signature remains accepted for backward compatibility but does not inject sender/role.

No merge attempt: this is security-sensitive permission propagation and remains blocked until that boundary is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants