fix(auto-fix): rename timeout continuation handoff path to avoid SDK deadlock validator by khaliqgant · Pull Request #95 · AgentWorkforce/ricky

khaliqgant · 2026-05-11T06:40:29Z

Summary

repairAgentStepTimeouts injects a RICKY_TIMEOUT_REPAIR block into the timed-out lead step's task and asks it to write a handoff at .workflow-artifacts/ricky-auto-fix/\${stepId}-timeout-continuation.md. That filename embeds the literal name of the continuation step (\${stepId}-timeout-continuation), which sits downstream of the lead in the DAG.
The SDK's detectLeadWorkerDeadlock validator (@agent-relay/sdk/dist/workflows/runner.js:1961) substring-matches downstream step names inside interactive lead tasks. When it hits, it throws at validateConfig time and the workflow never gets to execute().
The auto-fix loop then keeps retrying the same repair, each attempt regenerating the same handoff path and hitting the validator again, until it gives up. Ricky's runtime classifies the resulting node exit as NETWORK_UNREACHABLE, which is misleading — the actual cause is a static validation failure caused by our own repair output.
Fix: switch the handoff suffix from -timeout-continuation.md to -handoff.md so the lead's task no longer contains the literal continuation step name. Step name, dependsOn wiring, and the *_TIMEOUT_CONTINUATION_DONE sentinel marker are unchanged.

Reproducer

Any workflow with an interactive lead that times out under an SDK version including the deadlock validator. Symptoms:

Error: <config>: workflow \"...\" likely has a lead↔worker deadlock.
Step \"lead-coordinate\" (interactive lead) mentions downstream step(s)
[lead-coordinate-timeout-continuation] in its task and appears to wait
for their signals...
    at WorkflowRunner.detectLeadWorkerDeadlock
    at WorkflowRunner.validateWorkflow
    at WorkflowRunner.validateConfig
    at WorkflowRunner.execute

Auto-fix logs end with:

Auto-fix: stopped after 7/7 attempt(s) (NETWORK_UNREACHABLE)

What changed

src/local/auto-fix-loop.ts — timeoutContinuationPath() returns \${stepId}-handoff.md (was \${stepId}-timeout-continuation.md); both branches updated, comment explains the SDK-validator interaction.
src/local/auto-fix-loop.test.ts — regression assertions on the existing deterministically splits timed-out agent steps and resumes from the failed step test guard against the suffix drifting back.

Test plan

npx vitest run src/local/auto-fix-loop.test.ts — 29 / 29 pass

…deadlock validator The auto-fix loop injects a RICKY_TIMEOUT_REPAIR block into the timed-out lead step's task, instructing it to write a handoff at `.workflow-artifacts/ricky-auto-fix/${stepId}-timeout-continuation.md`. That filename embeds the literal name of the continuation step (`${stepId}-timeout-continuation`), which sits downstream of the lead in the DAG. The SDK's `detectLeadWorkerDeadlock` validator substring-matches downstream step names inside interactive lead tasks and refuses to run the workflow when it hits — so the repaired artifact never gets past `validateConfig` and the run dies with a misleading NETWORK_UNREACHABLE classification. Switch the suffix to `-handoff.md` so the lead's task no longer contains the continuation step's literal name. Step name, dependsOn wiring, and `*_TIMEOUT_CONTINUATION_DONE` marker are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-11T06:41:30Z

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1daf3c59-e81e-4015-9c6f-1e0134a4b30e

📥 Commits

Reviewing files that changed from the base of the PR and between f509e88 and cb5ae7a.

📒 Files selected for processing (1)

src/local/auto-fix-loop.ts

📝 Walkthrough

Walkthrough

The PR fixes a naming issue in timeout continuation handoff paths. The timeoutContinuationPath function is updated to generate -handoff.md filenames instead of -timeout-continuation.md to avoid downstream validator deadlock from substring matching. Regression assertions ensure the repaired workflow uses the new naming convention.

Changes

Timeout Continuation Handoff Path Naming

Layer / File(s)	Summary
Handoff Path Naming Implementation `src/local/auto-fix-loop.ts`	`timeoutContinuationPath` returns `-handoff.md` suffix instead of `-timeout-continuation.md` when `ARTIFACT_DIR` is available; comments updated to document the validator deadlock avoidance.
Regression Test for Handoff Filename `src/local/auto-fix-loop.test.ts`	Test assertions validate that the repair output contains `implement-tests-handoff.md` and does not contain `implement-tests-timeout-continuation.md`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

AgentWorkforce/ricky#86: Similar replacement of fragile substring/regex checks with TypeScript AST-based detection for top-level bindings.

Poem

🐰 A handoff renamed, oh what a sight,
From timeout-continuation to handoff so bright,
No more substring traps or validator's snare,
The deadline flows freely through the repair,
Tests whisper the truth: the path is now right! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: renaming the timeout continuation handoff path from a format that triggers the SDK deadlock validator to one that avoids it.
Description check	✅ Passed	The description comprehensively explains the problem, root cause, solution, and test coverage—all directly related to the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/timeout-continuation-handoff-path

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d36e8a70-f344-4bc1-bd7e-dbb6bebe3a93

📥 Commits

Reviewing files that changed from the base of the PR and between e8c167d and f509e88.

📒 Files selected for processing (2)

src/local/auto-fix-loop.test.ts
src/local/auto-fix-loop.ts

CodeRabbit review on PR #95: the existing `/\bARTIFACT_DIR\b/.test(content)` inside `timeoutContinuationPath` violates AGENTS.md "Source-Text Analysis: Use Grammar-Aware Parsers, Not Regex" — it can match the identifier inside a string literal, a comment, or a fenced code block in a task's task body, emitting a `${ARTIFACT_DIR}/...` path that references a binding that doesn't actually exist in the workflow. Replace the regex with `hasTopLevelArtifactDirBinding`, an AST walk over `sourceFile.statements` looking for a top-level `VariableStatement` whose declarations include an `ARTIFACT_DIR` identifier. String-literal contents are structurally inert and excluded by construction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-11T06:47:47Z

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-11T06-48-16-738Z-HEAD-provider
Mode: provider
Git SHA: 1586e77

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope**  
- Update the public‑facing documentation (`README.md`, `docs/architecture.md`) to state that **Ricky** now leverages the **Agent Assistant** library for deterministic workflow orchestration, validation gating, and review‑artifact generation.  
- Add a thin wrapper module (`ricky/assistant_integration.py`) that exposes the new **AgentAssistantClient** used by the core engine.  
- Adjust the workflow contract (`ricky/workflow_contract.json`) to include the new `assistant_validation` gate.  
- Add unit‑test coverage for the integration point.  

All changes are confined to the `ricky/` package and the `docs/` folder; no cross‑repo impact.

---

## 1. Documentation Updates  

| File | Change | Rationale |
|------|--------|-----------|
| `README.md` | • Add a “Deep Integration with Agent Assistant” section under **Architecture**.<br>• Update the badge to point to the new `agent-assistant` version. | Makes the dependency explicit for new contributors and users. |
| `docs/architecture.md` | • Expand the diagram to include an **Agent Assistant** block between *Workflow Engine* and *Verification Gate*.<br>• Add a paragraph describing deterministic validation, review‑artifact gen
...[truncated]

Must

Ground claims in real package imports and runtime paths.
Distinguish current implementation from target architecture.
Identify which Agent Assistant primitive is actually exercised.

Must Not

Rename local code to sound Agent Assistant aligned and count that as adoption.
Claim broad Agent Assistant native behavior from documentation-only alignment.
Blur target architecture with landed behavior.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑11)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 of 18 active agents have a `turn‑context` contract in their `workflow.yaml`. | 100 % of agents must expose a contract. | Identify the 6 agents lacking contracts; create minimal contracts (input: `prevTurn`, `currTurn`; output: `augmentedContext`). |
| **Deterministic verification gates** | 8 agents run a pre‑merge gate that validates `turn‑context` schema compliance. | All PRs must pass the `turn‑context‑schema` gate. | Add the gate to the remaining 10 pipelines (use `ricky verify-schema --path .ricky/turn-context`). |
| **Review artifacts** | 65 % of PRs include a “Turn‑Context Impact” checklist item. | 100 % inclusion. | Update the shared PR template (`.github/PULL_REQUEST_TEMPLATE.md`) to make the checklist mandatory. |
| **Validation loops** | Average of 42 validation loops per turn‑context change (target 80‑100). | 80‑100 loops to ensure stability. | Increase loop count by enabling the `ricky loop‑stress --iterations 100` job in CI for any change to t
...[truncated]

Must

Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
Record compact provenance through generation decisions or coordinator metadata.
Keep the shared turn context internal to the adapter boundary.

Must Not

Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
Drop Ricky-specific workflow metadata during envelope assembly.
Treat turn context as a product decision engine.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|---------------------------------------|
| **Workflow Generation** | Agent Assistant already owns the *authoring* contract (input spec → workflow definition). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for workflow shape; deterministic generation can be gated by a *generation‑contract* test suite. |
| **Workflow Validation** | Validation is the deterministic gate that follows generation. Embedding it in the same service enables immediate feedback loops (80‑100 validation cycles) without cross‑service latency. | Reduces false‑positive passes; validation artifacts (JSON schema diffs, lint reports) become first‑class review items. |
| **Debugging** | Debugging utilities (step‑trace, state‑snapshot, deterministic replay) require in
...[truncated]

Must

Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
Make extraction follow typed, tested, live product proof.

Must Not

Generalize workflow-specific behavior prematurely.
Adopt moving shared seams merely for architectural purity.
Lose the precise local-first staged workflow UX.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target: Core Customer‑Success (CS) teams, 3 pilot squads (10 users total)*  

---

## 1. Slice Objectives
| Goal | Success Metric | Acceptance Threshold |
|------|----------------|----------------------|
| **Reliability** – prove deterministic execution of Ricky‑generated workflows | Mean Time Between Failures (MTBF) on pilot runs | ≥ 96 h |
| **Coordination** – integrate Ricky into existing ticket‑routing pipelines | % of tickets auto‑routed by Ricky | ≥ 80 % |
| **Authoring** – enable CS analysts to create/modify contracts without dev support | Time to author a new contract (from spec to PR merge) | ≤ 2 days |
| **Feedback Loop** – capture real‑time blocker data | # of “blocker” tags per sprint | ≤ 1 per sprint |

---

## 2. Scope & Boundaries  

| Included | Excluded |
|----------|----------|
| • New **workflow contract** template (`ricky-contract.yaml`) <br>• Deterministic **verification gate** (`ricky‑verify.yml`) <br>• **Review artifact** checklist (design, test, security) <br>• **Branch/PR policy**: `ricky/feature/*` → `ricky/dev` → `ricky/main` <br>• **80‑loop validation** harness (simulated ticket streams) | • F
...[truncated]

Must

Pick exactly one real shared seam to evaluate or adopt.
Define a live Ricky product path that will prove the adoption.
Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
Skip the comparison/evaluation step for mature Ricky-local seams.
Treat adoption as successful without a live product-path proof.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Interface (Slack / Web)

Below is a **contract‑first design** for a new “Ricky Support Assistant” that can be deployed either as a Slack bot or as an embeddable web widget. The design follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).

---

### 1. High‑Level Architecture

| Component | Responsibility | Tech Stack (suggested) |
|-----------|----------------|------------------------|
| **Ricky Core Service** | Executes workflow coordination, reliability checks, and authoring APIs. | Python 3.11, FastAPI, PostgreSQL, Redis (for task queues) |
| **Slack Adapter** | Translates Slack events → Core API, posts responses. | Bolt for Python, Socket Mode (no public URL required) |
| **Web Widget Adapter** | Embeddable React component that talks to Core via REST/WebSocket. | React 18, TypeScript, Vite, WebSocket (socket.io) |
| **Auth & Rate‑Limit Service** | Central OAuth2 provider, per‑user throttling. | Keycloak, Redis‑based token bucket |
| **Observability Suite** | Structured logs, metrics, trace IDs, health checks. | Ope
...[truncated]

Must

Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
Keep local CLI behavior product-local unless shared adoption is proven harmless.
Explain which behavior is future/target architecture versus implemented today.

Must Not

Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
Let future surface abstractions distort the current CLI contract.
Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happened**

You invoked Ricky with the `--mode local` flag and a specification, but you omitted the `--run` flag. In this configuration Ricky only **parses** the spec and **generates** the workflow definition; it does **not** execute any of the workflow steps.

**How to proceed**

1. **If you only wanted the workflow definition** (e.g., to review, version‑control, or hand‑off to another team), you’re already done. The generated YAML/JSON file will be in the output directory you specified (or `./ricky-output/` by default). Open it to verify the stages, inputs, and contracts.

2. **If you want the workflow to actually run** (i.e., perform the package checks), add the `--run` flag:

   ```bash
   ricky --mode local --spec "generate a workflow for package checks" --run

This will:

Validate the spec against the workflow contract (inputs, outputs, verification gates).
Spin up the local executor sandbox.
Execute each step in order, applying deterministic verification gates after each stage.
Produce a run report (ricky-run-report.json) and any artifact logs.

Optional flags you may find useful
- --output ./my-workflow/ – choose a custo
  ...[truncated]


**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, copy‑and‑paste ready onboarding flow for a brand‑new user of the **Ricky** CLI. It follows the repository conventions (AGENTS.md), workflow standards, and shared authoring rules. The flow is deterministic, includes verification gates, and is scoped to a single feature branch (`onboarding‑first‑run`).  

---  

## 1️⃣  Create a dedicated onboarding branch  

```bash
git checkout -b onboarding-first-run

Scope: All files added/modified in this PR are limited to the onboarding experience (no production code changes).

2️⃣ Install the Ricky CLI (if not already installed)

# Preferred: use the official installer script
curl -fsSL https://ricky.dev/install.sh | bash

# Verify installation
ricky --version
# Expected output: ricky version X.Y.Z

Verification Gate: The command must exit with status 0 and print a semantic version string.

3️⃣ Initialise a new Ricky workspace

ricky init my-workspace
cd my-workspace

What happens:

Step	Artifact	Description
`ricky init`	`ricky.yaml`	Workspace manifest (metadata, version, a
...[truncated]


**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Ricky’s Standard Response to an Empty or Missing Specification**

| Phase | Action | Output / Artifact | Deterministic Gate |
|-------|--------|-------------------|--------------------|
| **1️⃣ Detect** | • Verify the presence of the spec file at the path declared in the workflow manifest.<br>• If the file exists, run a quick “non‑empty” check (`wc -c` > 0). | **Detection Log** – “Spec file `workflow.yaml` not found” *or* “Spec file `workflow.yaml` is empty (0 bytes)”. | **Gate 1** – *File‑Exists* / *Non‑Empty* must be true to proceed. |
| **2️⃣ Blocker Report** | • Immediately raise a **workflow‑blocking issue** in the repository’s issue tracker (e.g., GitHub Issues) with the label `workflow‑blocker`. | **Issue #\<n\>** – Title: *“Missing/Empty spec for `<workflow‑name>`”*.<br>Body includes:<br>• Path expected<br>• Reason why the workflow cannot be validated or executed<br>• Request for the author to supply a valid spec (link to spec template). | **Gate 2** – Issue must be created and linked to the PR/branch before any further automation runs. |
| **3️⃣ Create a Placeholder** | • In the same branch, add a **stub spec** named `workflow.yaml.stub` that contains the minimal require
...[truncated]

Must

Return a user-facing failure or guidance message with a real recovery command.
Distinguish generation failure from execution failure.
Show stack traces only when verbose diagnostic mode is requested.

Must Not

Crash with an uncaught exception in normal mode.
Suggest commands that do not exist.
Pretend a missing spec was accepted.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT CONNECTED (no successful health check)
• GitHub : NOT CONNECTED (no successful health check)

✅ Core Services
• Scheduler : RUNNING
• Dispatcher : RUNNING
• Storage : RUNNING

⚙️ Configuration
• Active workflow contracts : 12
• Pending PRs : 3
• Validation loops (last run): 0/80‑100 (awaiting provider data)

🔍 Last Verification Gates
• Google health check : SKIPPED / NO RESULT
• GitHub health check : SKIPPED / NO RESULT
• System integrity check : PASSED (2026‑05‑11 09:12 UTC)

🚧 Blockers
• Provider connectivity for Google and GitHub is missing.
→ Actions required:
1. Verify OAuth / API credentials for each provider.
2. Ensure network egress to provider endpoints is allowed.
3. Re‑run ricky provider check --google and ricky provider check --github.

📝 Next Steps
• Resolve the above blockers.
• Re‑execute ricky status to confirm connectivity.
• Once both providers report **CONNECTED
...[truncated]


**Must**

- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.

**Must Not**

- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-matcher-registry-backed</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

**Ricky output**

```text
**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Success Gate | Failure Gate |
|------|-------|--------|-------|--------|--------------|--------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a feature branch** | `main` HEAD SHA | New branch `feat/<short‑desc>` | Branch created and pushed to origin | Abort – report blocker to repo owner |
| 2️⃣ | **Ricky (Verification)** | **Run pre‑change lint & static analysis** | Code checkout | `lint-report.json` | Lint passes (0 errors) | Open PR comment with lint failures, halt progression |
| 3️⃣ | **Ricky (Authoring)** | **Apply the primitive change** | Diff/patch file supplied by requester | Modified files committed | `git diff --quiet` shows expected changes | Revert commit, raise “apply‑error” blocker |
| 4️⃣ | **Ricky (Testing)** | **Execute unit‑test suite** | Updated code | `test-report.xml` | All tests pass (≥ 95% pass rate) | Post test failures as review comment, stop merge |
| 5️⃣ | **Ricky (Verification)** | **Run security scan (SAST)** | Code base | `sast-report.json` | No new high‑severity findings | Fail → open issue, block merge |
| 6️⃣ | **Ricky (Coordination)** | **Open a Pull Request**
...[truncated]

Must

Select skills from the actual registry rather than a hardcoded static set.
Record ranked skill matches with confidence and trigger evidence.
Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

Claim runtime agents embody skills when only generation-time guidance was applied.
Fail generation solely because optional skill files are missing.
Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implement**

| **Section** | **Content** |
|-------------|-------------|
| **Name** | `review‑then‑implement` |
| **Owner** | `team‑mlops` |
| **Purpose** | Automate a two‑stage pipeline where Claude (LLM‑review) validates design/spec changes and Codex (code‑gen) produces the implementation. |
| **Scope** | • Textual design docs, API contracts, or feature tickets.<br>• Generates source files in the target repo (Python/JS). |
| **Branch/PR Boundary** | - **Review branch**: `review/<ticket‑id>` – created by Claude step.<br>- **Implementation branch**: `impl/<ticket‑id>` – created after successful review. |
| **Contract** | - **Input**: `design.md` (Markdown) located at `specs/`.<br>- **Output**: `src/` files + `CHANGELOG.md` entry.<br>- **Success criteria**: <br>  1. Claude returns a JSON verdict `{ “status”: “approved”, “issues”: [] }` or `{ “status”: “rejected”, “issues”: [...] }`.<br>  2. Codex produces syntactically valid code that passes unit tests (`pytest`/`jest`). |
| **Steps** | 1. **Trigger** – PR opened on `specs/` or manual dispatch with `ticket-id`.<br>2. **Claude Review** –<br>   - Action: `ricky/claude-review@v1`<br>   - Input: `design
...[truncated]

Must

Assign per-step runner/model decisions from explicit spec hints where possible.
Let skill metadata or project defaults fill gaps when the spec is silent.
Write tool-selection.json or equivalent audit metadata.

Must Not

Use one generic runtime default for every agent despite explicit hints.
Let skill defaults override explicit user/spec runner hints.
Omit the reason each tool or model was chosen.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes a **generated workflow JSON** (or YAML) and returns a **tighter, review‑ready version**. It focuses on two things:

| Target | What `--refine` does |
|--------|----------------------|
| **Task descriptions** | • Removes vague verbs (“do”, “handle”). <br>• Adds concrete inputs, outputs, and success criteria. <br>• Enforces the *“who‑does‑what‑when‑how”* template. |
| **Acceptance gates** | • Rewrites each gate into an explicit predicate (e.g., `output.fileCount == expected`). <br>• Adds a deterministic verification step (unit test, schema check, or API call). <br>• Inserts a **verification‑gate** block that can be auto‑executed by the CI runner. |

---

## 1. Basic Invocation  

```bash
awf generate --template my_template.yaml --output workflow.json
awf refine --input workflow.json --output workflow_refined.json

--input – path to the raw workflow you just generated.
--output – destination for the refined artifact.

You can pipe directly if you prefer:

awf generate -t my_template.yaml | awf refine > workflow_refined.json

...[truncated]


**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---

## 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

1. Returns the current Ricky version string (e.g., `ricky 0.9.3`).
2. Exits with status 0 on success, non‑zero on internal error.
3. Is reproducible across all supported platforms (Linux, macOS, Windows).
4. Is covered by unit‑tests, integration‑tests, and CI verification gates.

---

## 2. Scope & Boundaries
| Scope | In‑Scope | Out‑Of‑Scope |
|-------|----------|--------------|
| **Code** | CLI entry point, version constant, `--version` flag handling, test suite. | Feature flags, auto‑update logic, packaging beyond `setup.py`/`pyproject.toml`. |
| **Branch** | `feature/ricky-version-flag` (created from `main`). | Changes to unrelated modules (e.g., `ricky sync`). |
| **PR** | Single PR `feat: add --version flag` targeting `main`. | Multiple PRs for unrelated bugs. |

---

## 3. Contract (Deterministic Specification)

| Input | Expected Output | Exit Code |
|-------|----------------|-----------|
| `ricky --version` | `ricky <semver>` (e.g., `ricky 0.9.3`) printed to **stdout
...[truncated]

Must

Build gates from the stated acceptance behavior, such as checking ricky --version output.
Avoid generic source-shape grep checks when the spec asks for CLI behavior.
Keep generated validation meaningful for the current repo shape.

Must Not

Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
Claim the workflow is proven by source syntax alone.
Ignore the package-json version resolution order in the spec.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation & Auto‑Fix Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, config bundles, code modules) through deterministic validation, automatic remediation, and re‑validation until all pass or a blocker is raised.

---

## 1. Scope & Boundaries
| Item | Description |
|------|-------------|
| **Branch** | `feature/batch‑artifact‑validation` (isolated from `main`) |
| **PR Target** | `main` – only after **100 % pass** on the batch run |
| **Artifacts** | Any files under `artifacts/` (flat or nested). Each artifact is self‑contained; no cross‑artifact dependencies. |
| **Loop Limit** | Max **8 iterations** per artifact (configurable). Exceeding limit → **Blocker**. |
| **Team Ownership** | `@team‑validation` (review) + `@team‑ops` (runtime infra). |

---

## 2. Contract (YAML) – `workflow.yaml`

```yaml
name: batch-artifact-validation
on:
  workflow_dispatch:
    inputs:
      path:
        description: "Root directory of artifacts"
        required: true
        default: "artifacts/"
      max_iterations:
        description: "Maximum fix‑retry cycles per artifact"
        required: false
        default: 8
jobs:
  enum
...[truncated]

Must

Choose dag, supervisor, or pipeline deliberately based on the work shape.
Explain the pattern choice in artifact metadata or a rationale.
Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

Default blindly to dag for every workflow.
Collapse independent artifact work into one vague agent task.
Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a step‑by‑step illustration of the **five core Ricky writing skills** and the concrete artifacts they produce in a typical workflow generation cycle. The example workflow is a **“Customer‑Onboard‑Email‑Sequence”** that sends a welcome series after a new user signs up.

| # | Skill | What the skill does | Concrete Contract / Artifact Produced | Effect on the Final Workflow |
|---|-------|---------------------|--------------------------------------|------------------------------|
| 1 | **Scope Definition** | Writes a **`workflow‑spec.yaml`** that enumerates inputs, outputs, success criteria, and boundary conditions. | ```yaml\nname: customer-onboard-email-sequence\ninputs:\n  - user_id: string\n  - email: string\noutputs:\n  - sequence_status: enum[completed, failed]\nsuccess_criteria:\n  - all three emails sent within 48 h\n  - no bounce or spam‑report\n``` | Guarantees that every downstream step knows exactly what data it can rely on, preventing “missing‑field” runtime errors. |
| 2 | **Deterministic Gate Design** | Inserts **verification gates** (unit‑test‑style checks) after each major node. | `
...[truncated]

Must

Describe skills as generation-time selection, loading, and rendering inputs.
Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
Say the workflow instructions were informed by selected skills.

Must Not

Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
Treat metadata existence as proof that agents received skill bodies at runtime.
Overstate current tests beyond the generation boundary.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Authoring Assistant**

---

### 📋 Quick Summary  
Your request is to generate a workflow from a *vague* product specification. To deliver a reliable, testable workflow we need a concrete contract that defines:

1. **Inputs & Outputs** – data shapes, types, and validation rules.  
2. **Core Steps** – high‑level actions, required services, and any external integrations.  
3. **Success / Failure Gates** – deterministic verification points (e.g., schema checks, API response codes).  
4. **Review & Validation Loop** – number of automated test cycles (80‑100 loops) and manual review artifacts.  
5. **Branch / PR Scope** – what files/folders will be touched, naming conventions, and CI checks.

Without these details the generated workflow would be speculative and could break downstream pipelines.

---

### 🔎 What I Need From You

| Category | Example Detail | Why It Matters |
|----------|----------------|----------------|
| **Product Goal** | “Create a PDF invoice from an order payload.” | Drives step selection (e.g., templating, storage). |
| **Input Schema** | `{ orderId: string, items: [{sku:string, qty:number}], total: number }` | Enables deterministic validation g
...[truncated]

Must

Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
Validate with dry-run or targeted structural checks where possible.
Return artifacts, warnings, and follow-up commands honestly.

Must Not

Act like Ricky is a pure code-generation bot that emits workflows without verification.
Stop at "code compiles" as the proof bar.
Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s post‑failure playbook**

1. **Capture the failure state**  
   - Pull the run‑log for the timed‑out step and the heartbeat of the “alive‑but‑idle” worker.  
   - Store the logs, metrics, and any partial artifacts in the **`/incidents/<run‑id>/`** bucket with a timestamped manifest.  
   - Auto‑generate a **Failure Report** (Markdown) that includes:
     * Step name, expected timeout, actual duration.  
     * Worker ID, last heartbeat, resource usage snapshot.  
     * Exit codes / exception traces (if any).  

2. **Mark a deterministic blocker**  
   - Insert a **`BLOCKER: TIMEOUT_OR_ORPHAN_WORKER`** entry into the workflow’s state store.  
   - Propagate the blocker to the **Workflow Orchestrator** so downstream steps are paused and no new workers are scheduled for this run.

3. **Create a scoped incident branch**  
   - From the workflow’s **`main`** branch, create **`incident/<run‑id>-timeout‑orphan`**.  
   - Commit the Failure Report and any collected logs.  
   - Open a PR titled **`[INCIDENT] <run‑id> – Timeout + Orphan Worker`** with the following checklist:
     - [ ] Verify timeout configuration for the failing step.  
     - [ ] Verify worker liveness‑heartbea
...[truncated]

Must

Classify the failure before deciding whether to retry.
Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
Preserve raw evidence and uncertainty when the class is not known.

Must Not

Blindly rerun the whole workflow without a blocker classification.
Treat every failure as a broken workflow definition.
Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Detection for Local Launches**  

| **Stage** | **Input** | **Output** | **Success Criteria** |
|-----------|-----------|------------|----------------------|
| **1. Pre‑launch Scan** | Path to project root (string) | `scanReport: { stale: boolean, details: ScanDetail[] }` | All three directories (`.agent‑relay/`, `.relay/`, `.trajectories/`) are examined; the function returns a deterministic boolean flag. |
| **2. Gate – Blocker Evaluation** | `scanReport` | `gateDecision: { proceed: boolean, blocker: string|null }` | If `stale===true` → `proceed=false` and a clear blocker message is emitted; otherwise `proceed=true`. |
| **3. Review Artifact Generation** | `scanReport` | `artifactPath: string` (JSON file) | Artifact is written to `<projectRoot>/.ricky/reports/stale‑state‑<timestamp>.json`. |
| **4. Optional Auto‑Clean** (user‑opt‑in) | `scanReport` + user flag `autoClean=true` | `cleanResult: { cleaned: boolean, errors: string[] }` | All stale files are removed; any failure is reported in `errors`. |
| **5. Launch Continuation** | `gateDecision.proceed` | – | If `true`, the normal local workflow (`ricky run …`) proceeds; if `false`, the launch abo
...[truncated]

Must

Classify stale local runtime state as an environment contamination issue.
Recommend quarantine or isolated-run guidance before launch.
Record the observed paths and the action taken or recommended.

Must Not

Treat stale runtime state as a workflow logic failure.
Delete or overwrite state without an explicit safe path or user intent.
Continue into execution as if the workspace were clean.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**How to resolve “another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is created when a previous **Ricky** (or **Relay**) execution didn’t finish cleanly. Follow the deterministic clean‑up steps below; each step is a verification gate that you can run independently.

---

## 1️⃣ Verify the active marker

```bash
# From the root of the workspace
cat .ricky/run.marker

If the file exists you’ll see a JSON payload with the PID of the process that created it, e.g.:

{
  "pid": 12345,
  "started_at": "2026-05-10T14:32:07Z",
  "type": "ricky"
}

If the file is missing, the error is coming from a stale lock in the Relay subsystem (see step 4).

2️⃣ Check the recorded PID

ps -p 12345 -o pid,cmd

If the process is still running → it is legitimately holding the lock.
- Decide whether to let it finish or abort it.
- To abort (only if you’re sure it’s hung):
```
kill -9 12345
```
If the process is not found → the marker is stale. Proceed to step 3.

3️⃣ Remove the stale marker (deterministic verification gate)

rm -f .ricky/run.marker

Verification:
...[truncated]


**Must**

- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.

**Must Not**

- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.env-loader-injection-runtime-loadable</code> (runtime-recovery/manual)</summary>

**User message**

> A workflow artifact references a `MISSING_ENV_VAR` value. Ricky's deterministic auto-fix injects the `.env.local` / `.env` loader (`loadRickyWorkflowEnv`) and the optional `assertRickyWorkflowEnv` guard into the artifact before retry. The artifact may be a master-rendered workflow whose `.step({ command: ... })` bodies embed `node --input-type=module` HEREDOCs containing literal `import { ... } from 'node:fs'` / `from 'node:path'` strings.

**Ricky output**

```text
**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix injects the **Ricky environment loader** (`loadRickyWorkflowEnv`) and the **optional guard** (`assertRickyWorkflowEnv`) **once**, at the top of the generated file, and then re‑exports the original workflow body unchanged. This guarantees that every step runs with the required variables available, while keeping the original HEREDOC‑embedded Node commands intact.

---

## 1. Patch contract

| Item | Description |
|------|-------------|
| **Target** | Any `.js`/`.ts` workflow artifact produced by the master‑renderer (e.g., `myWorkflow.generated.js`). |
| **Pre‑condition** | The file contains a reference to `process.env.MISSING_ENV_VAR` (or any `process.env.*` that is not defined in the current environment). |
| **Post‑condition** | The file begins with a deterministic import of `loadRickyWorkflowEnv` and `assertRickyWorkflowEnv`, calls `loadRickyWorkflowEnv()` immediately, and optionally calls `assertRickyWorkflowEnv(['MISSING_ENV_VAR'])`. All original code (including HERED
...[truncated]

Must

Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a concrete, reproducible contract for running the workflow locally with **auto‑fix enabled**, handling the first‑attempt failure, applying the repair, and re‑executing the failed step. All steps are deterministic, include verification gates, and produce the required review artifacts.

---

## 1. Preconditions (Inputs)

| Name | Type | Description | Source |
|------|------|-------------|--------|
| `workflow.yaml` | file | The workflow definition (YAML) to be executed. | Repository root |
| `run_id_prev` | string | Identifier of the previous successful run (e.g., `run‑2024‑11‑03‑a1b2c3`). | Provided by user |
| `failed_step_id` | string | The step identifier that failed on the first attempt (e.g., `step‑build‑docker`). | Provided by user |
| `auto_fix=true` | bool | Flag to enable the auto‑fix engine. | Hard‑coded for this run |
| `repair_artifact.json` | file | JSON artifact produced by the repair service (contains diff/patch). | Provided by user |

*All files must be present in the local checkout under `./.rwc/` (RWC = “Reliability Workbench Container”).*

---

## 2. Execution Environment

| Component | Version | Reason |
|----
...[truncated]

Must

Use a bounded retry budget and summarize every attempt.
Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
Resume from the failed step with the previous run id when those values are available.

Must Not

Edit arbitrary repository source files as the default auto-fix surface.
Keep retrying after the configured max attempts.
Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Failure Diagnosis**

When `ricky run workflows/foo.ts --no-auto-fix` exits with a non‑zero status, the engine has hit a **deterministic verification gate** that cannot be auto‑fixed. Follow the steps below to isolate the root cause, produce the required review artifacts, and get the workflow back to a green state.

---

## 1. Capture the Run Report

```bash
ricky run workflows/foo.ts --no-auto-fix --report ./tmp/foo-run-report.json

The JSON report contains:
- gateResults[] – each verification gate outcome (PASS/FAIL)
- errorStack – full stack trace for the failing gate
- artifactDiffs – any diff that would have been auto‑fixed (now only reported)

Save this file and attach it to the bug ticket (or PR comment) – it is the single source of truth for the failure.

2. Identify the Failing Verification Gate

Open the report and look for the first entry with "status":"FAIL":

{
  "gateId": "contract:input-schema",
  "status": "FAIL",
  "message": "Input payload does not conform to schema FooInputV2",
  "details": { ... }
}

The gateId tells you which workflow contract is violated. The most common gate families are:

| Gate
...[truncated]


**Must**

- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.

**Must Not**

- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>

**User message**

> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

**Ricky output**

```text
**Ricky – Local Execution of a TypeScript Workflow Artifact**  

Below is the canonical, reproducible procedure that Ricky (the AgentWorkforce workflow reliability assistant) follows to run a TypeScript‑based workflow artifact that lives in the primary local repository path (`./workflows`). The steps are expressed as a **workflow contract** with deterministic verification gates, required review artifacts, and a bounded validation loop (80‑100 iterations) that guarantees repeatable success or a clear blocker report.

---

## 1. Preconditions (Contract “Input”)

| Item | Requirement | Source |
|------|-------------|--------|
| **Repo root** | Must be the top‑level directory of the AgentWorkforce codebase (contains `package.json`, `tsconfig.json`, `.github/workflows`). | `git rev-parse --show-toplevel` |
| **Workflow artifact** | A single TypeScript file (or entry point) located at `./workflows/<name>.ts`. | Path supplied by user or discovered via `ls workflows/*.ts` |
| **Node version** | `>=18.0.0` (LTS) – enforced by `.nvmrc` or `engines.node` in `package.json`. | `node -v` |
| **Dependencies** | All npm packages installed (`npm ci`). | `node_modules/` must exist and match `package
...[truncated]

Must

Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
Record the actual spawn command in execution evidence.

Must Not

Fail solely because agent-relay is not on PATH when the SDK route is available.
Hide the actual runtime command from evidence.
Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Protocol for Persistent Structural Failures**

When a structural failure in a workflow remains unresolved after an initial fix attempt, Ricky follows a deterministic, contract‑driven protocol that isolates the problem, documents the blocker, and coordinates a scoped remediation effort. The protocol is expressed as a reusable **“Structural‑Failure‑Escalation”** workflow contract.

---

## 1. Contract Definition – `StructuralFailureEscalation.yml`

```yaml
name: StructuralFailureEscalation
description: >
  Handles unrecoverable structural failures that persist after a first‑pass fix.
  Guarantees deterministic reporting, scoped branching, and a bounded
  validation loop (80‑100 iterations max) before escalation.

inputs:
  workflow_id:
    type: string
    description: Identifier of the failing workflow.
  failure_signature:
    type: string
    description: Unique hash of the failure (e.g., stack trace + commit SHA).
  attempted_fix_branch:
    type: string
    description: Branch where the initial fix was applied.
  fix_commit:
    type: string
    description: SHA of the commit that introduced the attempted fix.
  owner:
    type: string
    description
...[truncated]

Must

Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
Distinguish escalation from a generic product failure.
Preserve enough context for a human operator to continue.

Must Not

Retry speculative fixes indefinitely.
Collapse to "something went wrong" without the attempted actions.
Discard evidence from failed repair attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Compiled from Recent Runs**  

| Metric | Value (last 30 days) | Trend |
|--------|----------------------|-------|
| **Total Runs** | 1 842 | ↗︎ +8 % |
| **Successful Runs** | 1 617 (87.8 %) | ↗︎ +5 % |
| **Failed Runs** | 225 (12.2 %) | ↘︎ ‑3 % |
| **Avg. Runtime** | 3 m 27 s | ↔︎ steady |
| **90‑th‑pctile Runtime** | 5 m 12 s | ↘︎ ‑7 % |
| **Deterministic‑Gate Pass Rate** | 96.4 % | ↗︎ +2 % |
| **Review‑Artifact Completion** | 98.1 % | ↗︎ +1 % |
| **Validation‑Loop Passes (80‑100 loops)** | 1 732 (94 %) | ↘︎ ‑1 % |
| **Blocker Incidents** | 7 | ↔︎ steady |

---

### 1. Success‑Rate Highlights
* **Success ≥ 90 %** for all core pipelines (ingest → transform → publish).  
* The **deterministic verification gate** (checksum & schema match) succeeded in 96.4 % of runs, up from 94.2 % two weeks ago after the “schema‑evolution” patch.

### 2. Failure Analysis (Top 3 Root Causes)
| # | Root cause | Frequency | Typical impact | Mitigation |
|---|------------|-----------|----------------|------------|
| 1 | **Transient external API timeout** (e.g., vendor data fetch) | 112 runs (49 % of failures) | Run abort, downstream missing records | Added exponential‑b
...[truncated]

Must

Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

Mutate evidence while analyzing it.
Return generic advice like "improve your workflow" without references.
Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

### 1. Overview
The **Ricky Slack Debugger** is a dedicated Slack app surface that lets engineers and product owners inspect, step‑through, and repair Ricky‑managed workflows without leaving Slack. It follows the **Ricky repository conventions** (AGENTS.md), **workflow standards**, and **shared authoring rules**:

| Aspect | Decision |
|--------|----------|
| **Entry point** | Slash command `/ricky‑debug` |
| **Interaction model** | Modal dialogs + message actions + threaded updates |
| **State persistence** | Immutable workflow execution logs stored in `ricky‑logs/` (Git‑tracked) |
| **Verification gates** | Deterministic pre‑flight checks, post‑run validation, 80‑100 % loop coverage |
| **Review artifacts** | Auto‑generated PR with `debug‑patch/` branch, `debug‑review.md` checklist |
| **Blocker reporting** | Immediate “⚠️ Blocker” message with `ricky‑blocker` tag and ticket link |
| **Scope** | One workflow per debug session; PR limited to the affected `.ricky.yaml` and generated test harness |

---

### 2. User Flow (Contract)

| Step | Trigger | Input | Output | Validation |
|------|---------|-------|--------|------------|
| **1️⃣ 
...[truncated]

Must

Treat Slack as a co-equal product surface, not a wrapper around CLI.
Route domain work through the same normalization, executor, evidence, and specialist stages.
Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

Put workflow generation or diagnosis domain logic directly in the Slack handler.
Degrade Slack to a developer shortcut with weaker routing than CLI.
Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

| Field | Type | Description |
|-------|------|-------------|
| **type** | `string` (enum) | Fixed value `"web_submit_spec"` – identifies this hand‑off. |
| **payload** | `object` | Container for the data the UI sends. |
| **payload.spec** | `object` | Full workflow specification (JSON‑Schema‑validated). |
| **payload.mode** | `string` (enum) | Execution mode – `"dry‑run"` | `"run"` | `"preview"` |
| **payload.metadata** *(optional)* | `object` | Caller‑provided context (e.g., `requestId`, `userId`). |
| **response** | `object` | Returned to the UI after processing. |
| **response.status** | `string` (enum) | `"accepted"` \| `"rejected"` \| `"error"` |
| **response.message** | `string` | Human‑readable status description. |
| **response.runId** *(if status=accepted)* | `string` | Identifier of the instantiated workflow run. |

---

## 1. Contract Definition (YAML)

```yaml
# contracts/web_submit_spec.yaml
type: object
required: [type, payload]
properties:
  type:
    const: web_submit_spec
  payload:
    type: object
    required: [spec, mode]
    properties:
      spec:
        $ref: "#/components/schemas/WorkflowSpec"
      mode:
        
...[truncated]

Must

Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
Keep auth/session validation in the web surface before normalization.
Ensure downstream code does not need to know the request came from web.

Must Not

Short-circuit directly from the web handler to executors.
Create a parallel domain model for web requests.
Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordina

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

devin-ai-integration Bot reviewed May 11, 2026

View reviewed changes

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread src/local/auto-fix-loop.ts Outdated

khaliqgant merged commit 359cf9c into main May 11, 2026
1 of 2 checks passed

khaliqgant deleted the fix/timeout-continuation-handoff-path branch May 11, 2026 06:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auto-fix): rename timeout continuation handoff path to avoid SDK deadlock validator#95

fix(auto-fix): rename timeout continuation handoff path to avoid SDK deadlock validator#95
khaliqgant merged 2 commits into
mainfrom
fix/timeout-continuation-handoff-path

khaliqgant commented May 11, 2026

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026 •

edited

Loading

2️⃣ Install the Ricky CLI (if not already installed)

3️⃣ Initialise a new Ricky workspace

2️⃣ Check the recorded PID

3️⃣ Remove the stale marker (deterministic verification gate)

2. Identify the Failing Verification Gate

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

khaliqgant commented May 11, 2026

Summary

Reproducer

What changed

Test plan

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ricky Eval Review

Human Review Cases

2️⃣ Install the Ricky CLI (if not already installed)

3️⃣ Initialise a new Ricky workspace

2️⃣ Check the recorded PID

3️⃣ Remove the stale marker (deterministic verification gate)

2. Identify the Failing Verification Gate

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading