fix: surface failing-job link + real error in timelock executor Slack alerts#1925
fix: surface failing-job link + real error in timelock executor Slack alerts#1925mirooon wants to merge 3 commits into
Conversation
…in runPendingTimelockTXs workflow - Added permissions for actions to resolve job URLs for Slack deep-linking. - Implemented logic to create deep links directly to the failing job and step, improving notification clarity. - Updated environment variables to support the new deep-linking functionality.
…p oversized errors The executor's own SlackNotifier (which fires on the happy path and on per-operation failures) posted neither a link to the CI run nor, for Tron broadcast failures, the error itself — the raw-transaction error string blew past Slack's 3000-char per-block limit and was rejected with invalid_blocks. - SlackNotifier takes an optional runUrl and appends a "View workflow run" deep-link to failure/summary/not-scheduled notifications. - Clamp error/detail text below the Slack block limit so large RPC errors post. - Workflow resolves the deep job URL via the jobs API and passes it through TIMELOCK_RUN_URL; the failure-fallback step also deep-links the same way. - Add colocated tests for the link block and truncation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughAdds optional CI run deep-link support and text-truncation to SlackNotifier, updates the deploy script to pass the run URL, adds tests for deep-link rendering and payload safety, and enhances the GitHub Actions workflow to resolve job-level run URLs for Slack messages. ChangesSlack Notification Deep-Linking and Payload Safety
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install failed. For unrecoverable errors, disable the tool in CodeRabbit configuration. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
script/utils/slack-notifier.ts (1)
507-540:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard against Slack block-count overflow in not-scheduled alerts.
notifyNotScheduledadds one block per operation (Line 507 onward), and the new run-link block on Line 538 further increases total block count. For largeroperationsarrays, Slack can reject the payload once block count exceeds its limit, dropping the alert entirely.Suggested fix (cap rendered operations + summarize remainder)
+const SLACK_MAX_BLOCKS = 50 +const RESERVED_NOT_SCHEDULED_BLOCKS = 5 // header, intro, divider, action, run-link(context) +const MAX_NOT_SCHEDULED_OPERATION_BLOCKS = + SLACK_MAX_BLOCKS - RESERVED_NOT_SCHEDULED_BLOCKS // Add details for each not-scheduled operation - for (const op of operations) + const shownOperations = operations.slice(0, MAX_NOT_SCHEDULED_OPERATION_BLOCKS) + for (const op of shownOperations) if (message.blocks) { let operationText = `• *Operation ID:* \`${this.truncateHash( op.operationId )}\`\n *MongoDB ID:* \`${ op.transactionId }\`\n *Safe Tx Hash:* \`${this.truncateHash(op.safeTxHash)}\`` if (op.executionHash) operationText += `\n *Execution Hash:* \`${this.truncateHash( op.executionHash )}\`` message.blocks.push({ type: 'section', text: { type: 'mrkdwn', text: operationText, }, }) } + if (operations.length > shownOperations.length && message.blocks) + message.blocks.push({ + type: 'section', + text: { + type: 'mrkdwn', + text: `…and ${operations.length - shownOperations.length} more operation(s) not shown.`, + }, + })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@script/utils/slack-notifier.ts` around lines 507 - 540, notifyNotScheduled can exceed Slack's block limit by adding a block per operation plus the run-link block; limit the number of per-operation blocks rendered by defining a MAX_BLOCKS constant (respecting Slack's limit) and reserving slots for the run-link and a summary block, compute maxOps = MAX_BLOCKS - reservedSlots, iterate only over operations.slice(0, maxOps) when pushing operation blocks (using truncateHash as before), and if operations.length > maxOps push an extra summary section block like "and N more operations not shown" before calling appendRunLink and sendNotificationWithRetry so the payload never exceeds Slack's block count.
🧹 Nitpick comments (2)
script/utils/slack-notifier.test.ts (1)
55-111: ⚡ Quick winAdd a regression test for
notifyNotScheduledrun-link behavior.This PR adds
appendRunLink()tonotifyNotScheduled, but the suite only asserts run-link behavior for failure and batch-summary paths. Add one test for link present/absent onnotifyNotScheduledto lock the new contract.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@script/utils/slack-notifier.test.ts` around lines 55 - 111, Add regression tests for SlackNotifier.notifyNotScheduled to assert run-link behavior: use mockFetchCapturing() and call new SlackNotifier(WEBHOOK, RUN_URL).notifyNotScheduled({ network: 'tron', operation: baseOp, reason: 'some reason' }) and assert a context block with `<${RUN_URL}|View workflow run>` is present, and a second test that constructs new SlackNotifier(WEBHOOK) (no RUN_URL) and asserts no context block is emitted. Place these alongside the existing SlackNotifier tests so notifyNotScheduled and the new appendRunLink() behavior are covered..github/workflows/runPendingTimelockTXs.yml (1)
105-112: Confirm thegithub.job→ Jobs API.namematch forrunPendingTimelockTXs.yml
execute-timelockin.github/workflows/runPendingTimelockTXs.ymlhas no job-levelname:key, so the Jobs APInamefield (job display name) should default to the job id/key. That makes the current selectorselect(.name == "${{ github.job }}")appropriate, and the fallback to.jobs[0].html_urlshould only trigger for unexpected API/schema behavior. If a job-levelname:is added later (or the workflow gains multiple jobs), the selector could stop matching and the fallback may link to the wrong job.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/runPendingTimelockTXs.yml around lines 105 - 112, Add an explicit job-level name that matches github.job so the Jobs API selector works reliably: update the job that runs the timelock (the job currently referenced by github.job / the execute-timelock step) to include name: "execute-timelock" (or whatever string github.job evaluates to), ensuring the jq selector select(.name == "${{ github.job }}") used to compute RESOLVED_URL and export TIMELOCK_RUN_URL continues to match the correct job; alternatively, if you prefer not to add a name, change the jq selector to match the job key/id field instead of .name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@script/utils/slack-notifier.ts`:
- Line 652: The fallback in extractErrorMessage uses JSON.stringify(error) which
will throw on circular or non-serializable objects; wrap serialization in a safe
try/catch or use a circular-safe serializer (e.g., a replacer that tracks seen
objects or util.inspect) to produce a string, then truncate to 500 chars. Update
extractErrorMessage to attempt JSON.stringify(error) inside try/catch (or use a
circular-safe replacer) and on error fall back to String(error) or
util.inspect(error, { depth: null }) before slicing to 500 characters.
---
Outside diff comments:
In `@script/utils/slack-notifier.ts`:
- Around line 507-540: notifyNotScheduled can exceed Slack's block limit by
adding a block per operation plus the run-link block; limit the number of
per-operation blocks rendered by defining a MAX_BLOCKS constant (respecting
Slack's limit) and reserving slots for the run-link and a summary block, compute
maxOps = MAX_BLOCKS - reservedSlots, iterate only over operations.slice(0,
maxOps) when pushing operation blocks (using truncateHash as before), and if
operations.length > maxOps push an extra summary section block like "and N more
operations not shown" before calling appendRunLink and sendNotificationWithRetry
so the payload never exceeds Slack's block count.
---
Nitpick comments:
In @.github/workflows/runPendingTimelockTXs.yml:
- Around line 105-112: Add an explicit job-level name that matches github.job so
the Jobs API selector works reliably: update the job that runs the timelock (the
job currently referenced by github.job / the execute-timelock step) to include
name: "execute-timelock" (or whatever string github.job evaluates to), ensuring
the jq selector select(.name == "${{ github.job }}") used to compute
RESOLVED_URL and export TIMELOCK_RUN_URL continues to match the correct job;
alternatively, if you prefer not to add a name, change the jq selector to match
the job key/id field instead of .name.
In `@script/utils/slack-notifier.test.ts`:
- Around line 55-111: Add regression tests for SlackNotifier.notifyNotScheduled
to assert run-link behavior: use mockFetchCapturing() and call new
SlackNotifier(WEBHOOK, RUN_URL).notifyNotScheduled({ network: 'tron', operation:
baseOp, reason: 'some reason' }) and assert a context block with
`<${RUN_URL}|View workflow run>` is present, and a second test that constructs
new SlackNotifier(WEBHOOK) (no RUN_URL) and asserts no context block is emitted.
Place these alongside the existing SlackNotifier tests so notifyNotScheduled and
the new appendRunLink() behavior are covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 56dab192-5ef6-43c2-960a-b65d4c23a8a3
📒 Files selected for processing (4)
.github/workflows/runPendingTimelockTXs.ymlscript/deploy/safe/execute-pending-timelock-tx.tsscript/utils/slack-notifier.test.tsscript/utils/slack-notifier.ts
…rrors (slack-notifier.ts:652) CodeRabbit (PR #1925): JSON.stringify(error) throws on circular/non-serializable payloads (e.g. some viem errors), which would break notification assembly on the failure path. Wrap in try/catch with a String() fallback. Added a regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Which Linear task belongs to this PR?
EXSC-505
Why did I implement it this way?
The
Timelock Auto Executionworkflow (runPendingTimelockTXs.yml) posts execution results to#dev-sc-timelock-executionsthrough the executor'sSlackNotifier. Two operational problems made those alerts much less useful than intended, and this PR fixes both.Problem 1 — the actual error never showed up. On a per-operation failure (e.g. the recurring Tron
Account resource insufficientbroadcast failure), the❌ Timelock Operation Failedmessage — the only one that carries the real error — was silently rejected by Slack. The Tron broadcast error string is the entire raw transaction JSON (~3.5k chars); it was dropped verbatim into one Slack section block, and Slack rejects any block whose text exceeds 3000 chars with400 - invalid_blocks. So the whole message was dropped and the channel only ever showed the batch summary with a misleading• tron: Unknown error.SLACK_TEXT_LIMIT = 2900) inextractErrorMessage/notifyBatchSummary, so the per-operation failure (with the real error) actually posts. Verified live: the error now renders, truncated with a trailing….Problem 2 — hard to reach the workflow logs from Slack. To investigate a failure you had to leave Slack, open GitHub Actions, find the right workflow, then the right run, then the right job — many clicks and searching. The notification had no link.
View workflow rundeep-link to failure / summary / not-scheduled notifications, resolved to the specific failing job (and its failing step) via the jobs API, so a dev can jump straight to the logs in one click. The workflow resolves the job URL (needsactions: read+gh api) and passes it to the executor viaTIMELOCK_RUN_URL; the failure-fallback step deep-links the same way. The link is opt-in: absent locally (process.env.TIMELOCK_RUN_URLunset), so manual/local runs are unchanged.Files:
script/utils/slack-notifier.ts— optionalrunUrl,appendRunLink()context block on failure/summary/not-scheduled messages,truncateText()+SLACK_TEXT_LIMIT.script/deploy/safe/execute-pending-timelock-tx.ts— passprocess.env.TIMELOCK_RUN_URLinto the notifier..github/workflows/runPendingTimelockTXs.yml—actions: read; resolve the deep job URL via the jobs API in both the execute step (→TIMELOCK_RUN_URL) and the failure-fallback step.script/utils/slack-notifier.test.ts— new colocated tests (link present/absent, batch-summary link, oversized-error clamp).Out of scope: the unfunded Tron timelock executor wallet (the root cause of the failing ops themselves) is a separate operational issue.
Checklist before requesting a review
/pr-ready(local CodeRabbit) on this branch and resolved (or explicitly documented) all findings — see.agents/commands/pr-ready.mdChecklist for reviewer (DO NOT DEPLOY and contracts BEFORE CHECKING THIS!!!)