Skip to content

[CRCR] Implement on-call bot and allowlist functionality for downstream CI failures#8183

Draft
KarhouTam wants to merge 1 commit into
pytorch:mainfrom
KarhouTam:crcr-mergebot
Draft

[CRCR] Implement on-call bot and allowlist functionality for downstream CI failures#8183
KarhouTam wants to merge 1 commit into
pytorch:mainfrom
KarhouTam:crcr-mergebot

Conversation

@KarhouTam

@KarhouTam KarhouTam commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Implement CRCR on-call bot and merge-blocking logic for downstream CI failures, driven by a structured allowlist (L1-L4) that maps downstream repos to severity levels and on-call contacts.

Changes

CRCR Allowlist (lib/crcrAllowlist.ts)

Parses .github/allowlist.yml from pytorch/pytorch, mapping downstream repos to four levels:

  • L1/L2: Tracked only, no on-call behavior.
  • L3: Non-blocking failure — on-call is notified but merge is not blocked.
  • L4: Blocking failure — on-call is notified and merge is blocked.

Includes a 15-minute in-memory cache and strict YAML validation. Errors fail open.

On-call Bot (lib/bot/crcrOncallBot.ts)

A Probot bot on check_run.completed: when a CRCR check run fails, looks up on-calls from the allowlist and posts a tagged comment on the associated PR. Deduplicates via an HTML comment marker.

Merge Blocking (lib/bot/pytorchBotHandler.ts)

@pytorchbot merge checks the GitHub Checks API for CRCR failures on the PR head commit. L4 failures block the merge with a comment listing failing repos. -f bypasses this, consistent with existing force-merge semantics. Errors fail open.

Dr.CI Integration

  • fetchRecentWorkflows.ts: New fetchOotWorkflows() queries oot_workflow_job from ClickHouse.
  • drci.ts: L4 failures appear as blocking; L3 failures appear in a new "OUT OF TREE (non-blocking)" section.

Tests

Unit tests for allowlist parsing and on-call bot behavior, plus updated Dr.CI test helpers.

@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

@KarhouTam is attempting to deploy a commit to the Meta Open Source Team on Vercel.

A member of the Team first needs to authorize it.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant