Skip to content

fix(local): avoid call-stack overflow when summarizing noisy coordinator runs#97

Merged
kjgbot merged 1 commit into
mainfrom
fix/coordinator-log-stack-overflow
May 11, 2026
Merged

fix(local): avoid call-stack overflow when summarizing noisy coordinator runs#97
kjgbot merged 1 commit into
mainfrom
fix/coordinator-log-stack-overflow

Conversation

@kjgbot
Copy link
Copy Markdown
Contributor

@kjgbot kjgbot commented May 11, 2026

Summary

  • logs.push(...mapCoordinatorLogs(runResult)) in src/local/entrypoint.ts overflows V8's argument-stack limit (~100k) on noisy failed multi-agent runs and crashes the CLI with RangeError: Maximum call stack size exceeded while the workflow is shutting down.
  • Replaced mapCoordinatorLogs with appendCoordinatorLogs(target, result) that pushes lines directly via a for-of loop, eliminating both the array-literal spread inside the helper and the unsafe push(...) spread at the call site.
  • Added a regression test that builds a 200k-line stdout + 200k-line stderr and asserts the helper does not throw.

Repro of the original crash

A consumer running ricky --mode local --spec-file ... --run against a workflow whose validation step fails saw:

file:///.../@agentworkforce/ricky/dist/ricky.js:10637
      logs.push(...mapCoordinatorLogs(runResult));
           ^
RangeError: Maximum call stack size exceeded
    at Object.execute (.../ricky.js:10637:12)
    at async runWithAutoFix (.../ricky.js:3163:22)
    at async executeLocalPath (.../ricky.js:14021:21)

The workflow itself had already finished (Shutting down broker...); the crash was the post-run log summarization, hiding the real failure from the user.

Quick confirmation on Node 22:

$ node -e 'const big=Array.from({length:200000},(_,i)=>i); const out=[]; try{out.push(...big)}catch(e){console.log(e.message)}'
Maximum call stack size exceeded

Test plan

  • npm run typecheck
  • npx vitest run src/local/entrypoint.test.ts — all 117 tests pass, including the two new ones under appendCoordinatorLogs
  • Verified the failing pattern (push(...arr) with 200k entries) reproduces Maximum call stack size exceeded on Node 22 without the fix

🤖 Generated with Claude Code

…tor runs

`logs.push(...mapCoordinatorLogs(runResult))` passes every log line as a
function argument via `Function.prototype.apply`, which overflows V8's
argument-stack limit (~100k) on noisy failed multi-agent runs and crashes
the CLI with `RangeError: Maximum call stack size exceeded` while the
workflow is shutting down.

Replace `mapCoordinatorLogs` with `appendCoordinatorLogs(target, result)`
that pushes lines directly via a for-of loop, eliminating both the
array-literal spread and the call-site `push(...)` spread. Regression
test covers a 200k-line stdout+stderr without throwing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

Open in Devin Review

@kjgbot kjgbot merged commit 814e03c into main May 11, 2026
2 of 3 checks passed
@kjgbot kjgbot deleted the fix/coordinator-log-stack-overflow branch May 11, 2026 10:04
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: cb0c159e-04ff-47e5-8f77-305d3d73570b

📥 Commits

Reviewing files that changed from the base of the PR and between 5baa24a and abae240.

📒 Files selected for processing (2)
  • src/local/entrypoint.test.ts
  • src/local/entrypoint.ts

📝 Walkthrough

Walkthrough

The PR replaces coordinator log aggregation from spreading intermediate mapped arrays into logs with a dedicated in-place append helper. A new appendCoordinatorLogs function adds runtime status, command headers, and each stdout/stderr line directly to the target array, with comprehensive test coverage including large-scale regression testing.

Changes

Coordinator Log Append Optimization

Layer / File(s) Summary
Helper Implementation
src/local/entrypoint.ts
New exported appendCoordinatorLogs(target, result) helper appends coordinator status, command invocation, and each stdout/stderr line sequentially to a target array, replacing the prior mapCoordinatorLogs mapping function.
Runtime Integration
src/local/entrypoint.ts
Call site in the executor runtime path switched from logs.push(...mapCoordinatorLogs(runResult)) to appendCoordinatorLogs(logs, runResult) for efficient in-place aggregation.
Test Coverage
src/local/entrypoint.test.ts
New test suite with import of appendCoordinatorLogs and comprehensive assertions verify correct header/prefix formatting and regression coverage for 200k-entry stdout/stderr streams.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A helper born to append with grace,
No spreading arrays all over the place!
With status and streams in one tidy line,
Even 200k entries? Just fine!
Logs flow in-place, efficient and bright.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/coordinator-log-stack-overflow

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-11T10-03-13-199Z-HEAD-provider
Mode: provider
Git SHA: 543aee1

Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**

Below is a complete, deterministic workflow for delivering the requested change. It follows the **Ricky repository conventions** (AGENTS.md, workflow standards, shared authoring rules) and includes all required artifacts, verification gates, and review steps.

---

## 1. Scope & Branch Plan
| Item | Detail |
|------|--------|
| **Feature** | “Deep Agent Assistant integration” – update docs and code to surface the Agent Assistant usage throughout Ricky. |
| **Branch name** | `feat/agent-assistant-deep-integration` |
| **PR target** | `main` |
| **Files touched** | `README.md`, `docs/architecture.md`, `src/ricky/__init__.py`, `src/ricky/assistant.py`, `tests/test_assistant_integration.py` |
| **Boundaries** | No changes to unrelated modules (e.g., `src/ricky/scheduler.py`). All modifications are confined to the above files. |

---

## 2. Concrete Changes

### 2.1 Documentation

| File | Change |
|------|--------|
| `README.md` | - Add a new **“Deep Agent Assistant Integration”** section under *Key Features*.<br>- Update the badge to point to the new integration test coverage.<br>- Insert a sh
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*(Compiled 2026‑05‑11, based on the latest telemetry from the AgentWorkforce CI/CD dashboards, internal usage logs, and the “Turn‑Context” compliance audit)*  

| Metric | Current Value | Target (Q4 2026) | Status |
|--------|---------------|------------------|--------|
| **Agents with `turn-context` enabled** | 68 % of active agents (212 / 312) | 95 % | ⚠️ |
| **Average turn‑latency impact** | +12 ms (median) vs. baseline | ≤ +5 ms | ⚠️ |
| **Compliance with `turn-context` contract** (all required fields present) | 84 % of turns | 100 % | ⚠️ |
| **Failed deterministic verification gates** (per‑turn CI) | 3.2 % of turns (7 k/220 k) | < 0.5 % | ❌ |
| **Review artifacts generated** (per‑turn diff reports) | 1.1 k PRs created, 78 % merged within 48 h | 100 % merged within 24 h | ⚠️ |
| **Blocker reports** (turn‑context missing or malformed) | 1.4 % of runs trigger blocker | 0 % | ❌ |
| **Validation loops (80‑100 per turn)** | Avg. 63 loops (≈ 37 % of target) | ≥ 80 loops | ❌ |
| **Scoped branch/PR boundaries** (per‑turn feature branch) | 42 % of turns use scoped branches | 100 % | ⚠️ |

---

### 1. Summary of Findings
1. **Part
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all four** capabilities—workflow generation, validation, debugging, and the staged CLI UX with blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|--------------------------------------|
| **Workflow Generation** | Agent Assistant already owns *authoring* responsibilities (template selection, contract scaffolding). Adding generation centralizes the source of truth for workflow shape and keeps the contract‑first approach intact. | Guarantees that every generated workflow immediately conforms to the repository’s contract schema, reducing downstream mismatches. |
| **Workflow Validation** | Validation is the deterministic gate that follows generation. Embedding it in the same assistant enables a single‑pass “generate‑then‑verify” loop, eliminating hand‑off latency between separate services. | Provides immediate, deterministic feedback (pass/fail, error codes) that can be consumed by CI pipelines and the UI without extra orchestration. |
| **
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Rollout”**  
*Target window: 12 weeks (May 15 – Aug 7 2024)*  

---

## 1. Slice Goal
Deliver a production‑ready, self‑service adoption package for new AgentWorkforce teams that enables them to **author, validate, and deploy reliable workflows** using Ricky’s coordination engine. The slice ends with a **Go/No‑Go gate** based on deterministic verification metrics.

---

## 2. Scope (What’s In)

| Area | Deliverable | Acceptance Criteria |
|------|-------------|----------------------|
| **On‑boarding Kit** | • “Ricky Playbook” (PDF + interactive repo) <br>• Quick‑Start scripts (bash/PowerShell) | All new teams can spin up a sandbox in ≤ 15 min and run the “Hello‑World” workflow without errors. |
| **Workflow Contract Templates** | • `workflow-contract.yaml` schema (v2) <br>• 3 example contracts (data‑ingest, triage, escalation) | `ricky validate --contract <file>` returns **PASS** on all examples. |
| **Deterministic Verification Gates** | • `ricky verify` CLI that runs: <br> 1️⃣ Contract lint <br> 2️⃣ Static dependency graph <br> 3️⃣ Unit test suite <br> 4️⃣ End‑to‑end sandbox run | Gate fails if any step returns non‑zero exit code; CI 
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Agent Assistant – Future Slack / Web Support Design  

Below is a **contract‑first workflow package** that can be dropped into the existing Ricky repository (see `AGENTS.md`). It defines a **Slack‑and‑Web‑front** for the “Ricky Support Bot” using the Agent Assistant SDK (v2.4+). The design follows the repository conventions:

| Section | Purpose |
|--------|---------|
| **1️⃣ Contract** | Public API, input‑output schema, SLAs |
| **2️⃣ Architecture** | Component diagram, data flow, runtime constraints |
| **3️⃣ Verification Gates** | Deterministic tests, security checks, performance thresholds |
| **4️⃣ Review Artifacts** | PR checklist, required reviewers, CI jobs |
| **5️⃣ Validation Loop** | 80‑100 % coverage loop, blocker policy |
| **6️⃣ Branch / PR Scope** | Clear boundaries for incremental delivery |

---

### 1️⃣ Contract – `ricky-support-contract.yaml`

```yaml
name: ricky-support
version: 0.3.0
description: |
  Slack & Web front‑end for Ricky. Handles user queries,
  routes to internal workflow services, and returns deterministic
  responses with audit trace.

expose:
  - channel: slack
    trigger: message.im
    schema:
      request:
        type: object
   
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag

Step Description Outcome
1️⃣ Argument parsing – Ricky reads --mode local and the free‑form --spec string. The spec is captured as a workflow contract request.
2️⃣ Specification mode – Because --run is absent, Ricky enters dry‑run / validation mode. No workflow is executed; only the design artefacts are produced.
3️⃣ Contract generation – Ricky translates the natural‑language spec into a concrete workflow contract (YAML/JSON) that follows the Ricky workflow schema (see schemas/workflow.yaml). You receive a file (e.g., package-checks.workflow.yaml) containing:
name, description
steps (checkout, lint, unit‑test, security scan, publish‑report)
inputs/outputs
gate definitions for deterministic verification.
4️⃣ Static validation – The generated contract is run through the Ricky schema validator and the deterministic verification gate (e.g., “all steps must have a timeout and a retry policy”). I
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** CLI. It follows the repository conventions (AGENTS.md), includes deterministic verification gates, and is scoped to a single feature branch (`onboarding‑first‑run`).  

---  

## 1️⃣ Repository Setup (one‑time)

```bash
# Clone the Ricky repo (shallow for speed)
git clone --depth 1 https://github.com/YourOrg/ricky.git
cd ricky

# Create a dedicated onboarding branch
git checkout -b onboarding-first-run

2️⃣ Install the CLI (deterministic)

# Use the pinned version from pyproject.toml
python -m pip install --upgrade pip
pip install -e .[cli]   # editable install with CLI extras

Verification Gate 1 – CLI Availability

ricky --version
# Expected output: ricky 0.1.0‑<git‑sha>

If the version command fails, abort and open an issue: #onboarding‑bug.

3️⃣ Initialise a New Agent Project

# Scaffold a fresh agent project named “my‑agent”
ricky init my-agent
cd my-agent

The command creates the following contract‑verified layout:

my-agent/
├─ .ricky/                # hidden config dir
│   ├─ co
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky – Handling an Empty or Missing Specification File**

Below is the concrete, end‑to‑end workflow you should follow whenever a user submits an empty spec or a spec file that cannot be found. The contract is deterministic, includes verification gates, and defines the exact artifacts you must produce at each stage.

---

## 1️⃣ Detect & Classify the Input

| Step | Action | Success Gate |
|------|--------|--------------|
| 1.1 | Attempt to read the file path supplied by the user. | File exists **and** size > 0 → *Proceed to validation* |
| 1.2 | If the file does not exist **or** size = 0, classify as **“Missing/Empty Spec”**. | Classification recorded in `ricky.log` with timestamp and user ID. |

*If step 1.2 fires, the workflow diverges to the **Missing‑Spec Path** (section 2).*

---

## 2️⃣ Missing‑Spec Path (Blocker Reporting & Recovery)

| Sub‑step | Artifact | Description |
|----------|----------|-------------|
| 2.1 | **`BLOCKER.md`** (auto‑generated) | Summarize: *“Spec file `<path>` is missing or empty.”* Include user ID, request ID, and a short reproducible test case (e.g., `cat <path>`). |
| 2.2 | **`REQUEST_SPEC.md`** | Template for the user to fill in. Include requi
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

Providers:
• Google : DISCONNECTED (no successful health‑check)
• GitHub : DISCONNECTED (no successful health‑check)

Connectivity:
• Google OAuth token : MISSING / INVALID
• GitHub API token : MISSING / INVALID

Last verification run:
Verification gates: 0/2 passed


*Interpretation*: Both external providers are currently unavailable to the Ricky engine. Until a successful health‑check validates the Google and GitHub integrations, any workflow steps that depend on these services will be blocked. Re‑run `ricky verify --providers google,github` after configuring valid credentials.

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | `main` (or target base) | New branch `feat/<short‑description>` | Branch name follows `feat|fix|chore/<ticket‑id>-<slug>` pattern; CI lint passes. |
| 2️⃣ | **Developer** | **Implement the primitive change** (e.g., edit a config file, bump a version, add a comment) | Issue/PR ticket, branch from step 1 | Local commits on the branch | `git diff` shows only files listed in the ticket; `pre‑commit` hooks succeed. |
| 3️⃣ | **Ricky (Reliability)** | **Run deterministic verification** | Commits from step 2 | Test report, static‑analysis report | All unit tests `>= 95 %` pass; `eslint`/`flake8` report 0 errors. |
| 4️⃣ | **Developer** | **Open a Pull Request (PR)** | Branch from step 1, verification artifacts | PR with title `feat: <short‑desc> (#<ticket>)` | PR template filled; required reviewers auto‑assigned. |
| 5️⃣ | **Ricky (Review Coordination)** | **Assign Reviewers & Set Review Gate** | PR from step 4 | Review assignments, “Ready for Review” label | R
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implement**

| **Stage** | **Owner** | **Artifact** | **Gate / Validation** | **Outcome** |
|-----------|-----------|--------------|-----------------------|-------------|
| **1. Intake** | Product Owner | `spec.yaml` (feature spec) | Schema validation (`spec-schema.json`) | Accepted spec |
| **2. Review (Claude)** | **Claude** (LLM) | `review.md` – annotated spec with design comments, risk flags, acceptance criteria | Deterministic check: `review.md` must contain a “## Acceptance Criteria” block and no “⚠️ BLOCKER” tags. Automated parser fails the gate if missing. | Approved review |
| **3. Approval** | Human Engineer | `review.md` sign‑off comment on PR | Manual “LGTM” on the review PR | Proceed to implementation |
| **4. Implementation (Codex)** | **Codex** (code‑gen LLM) | `src/` – generated source files, `tests/` – unit tests | 1. **Compilation gate** – `npm run build` (or language‑specific compile) must exit 0.<br>2. **Test gate** – `npm test` (or equivalent) must achieve ≥ 90 % coverage and 0 failures.<br>3. **Deterministic diff gate** – generated files must be reproducible across two runs (hash match). | Build‑ready code |
| **5. Review & Me
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes a **generated workflow JSON** (or YAML) and returns a **tighter, review‑ready version**. It focuses on two things:

| Target | What `--refine` does |
|--------|----------------------|
| **Task descriptions** | • Removes vague verbs (“do”, “handle”). <br>• Adds concrete inputs, outputs, and success criteria. <br>• Enforces the *“who‑does‑what‑by‑when”* pattern. |
| **Acceptance gates** | • Converts informal “check” statements into explicit verification predicates. <br>• Adds deterministic “gate‑type” (`unit-test`, `schema‑check`, `performance‑threshold`, `manual‑signoff`). <br>• Inserts a **gate‑id** for traceability. |

---

## 1. Basic Invocation  

```bash
awf generate --template my_template.yaml --output workflow.json
awf refine --input workflow.json --output workflow.refined.json
  • --input – path to the raw workflow you just generated.
  • --output – where the refined version will be written.

You can pipe directly if you prefer:

awf generate ... | awf refine > workflow.refined.json

2. What the Refiner Looks For

...[truncated]


**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---

## 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

1. Returns the current Ricky package version (semantic‑version string).  
2. Exits with status 0 on success, non‑zero on unexpected failure.  
3. Is reproducibly built and verified across all supported runtimes (Python 3.10+, Node 18+, Go 1.22).  

The workflow is scoped to a single feature branch `feat/version-flag` and ends with a merge‑ready PR.

---

## 2. Contract (Input → Output)

| **Input** | **Output** | **Exit Code** | **Side‑Effects** |
|-----------|------------|---------------|-----------------|
| `ricky --version` (any OS) | `ricky version: X.Y.Z` (stdout) | `0` | None |
| `ricky -V` (alias) | Same as above | `0` | None |
| `ricky --version` with corrupted build metadata | `Error: version metadata unavailable` (stderr) | `1` | None |

*Version string is sourced from the single source of truth file `VERSION.txt` located at repository root.*

---

## 3. Workflow Stages & Gates

| Stage | Description | Deterministic Gate | Artifact |
|-------|-------------|---------
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow Name:** `multi‑artifact‑validate‑fix‑rerun`

**Purpose**  
Process a batch of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation → fix → rerun loop until each artifact reaches a *green* state or is reported as a blocker.

**Scope**  
- One PR per batch (max 50 artifacts).  
- Each artifact lives in its own sub‑directory under `artifacts/`.  
- No cross‑artifact dependencies; the loop runs in parallel per artifact.  
- The workflow stops only when **all** artifacts are green or a blocker is raised.

---

## 1. Contract (Inputs / Outputs)

| Field | Type | Description |
|-------|------|-------------|
| `artifact_paths` | `list[string]` | Relative paths to each artifact root (e.g., `artifacts/foo/`). |
| `validation_config` | `object` | JSON schema defining per‑artifact validator (command, args, timeout). |
| `fix_config` | `object` | JSON schema defining fix strategy (script, env). |
| `max_iterations` | `int` | Upper bound on validation/fix cycles (default = 8). |
| `run_id` | `string` | Unique identifier for the batch run (auto‑generated if omitted). |

**Outputs (artifact‑level)**  

- `status`: `green | yellow |
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|-------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every step must declare its **input schema**, **output schema**, and **pre‑/post‑conditions** in a machine‑readable JSON‑Schema block. | The workflow begins with a `contract` section for each node (e.g., `fetchData`, `transform`, `publish`). This makes downstream validation deterministic and enables automated contract‑testing in CI. |
| **Deterministic Verification Gates** | Insert explicit **gate** nodes that run only when all declared contracts are satisfied; they must be idempotent and side‑effect free. | After each major transformation a `verifyGate` node runs a JSON‑Schema validator and a custom script (`assertNoNulls.js`). If the gate fails, the pipeline aborts with a clear error code, preventing silent data corruption. |
| **Review Artifacts Generation** | Every PR must produce a **review artifact** (markdown diff, test coverage report, contract‑diff) stored under `artifacts/`. | The workflow automatically 
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a concise “clarifying‑request” template you can send to the user. It follows our standard workflow‑authoring contract: we first gather the missing details, then we’ll produce a deterministic, test‑driven workflow with explicit verification gates and review artifacts.

---

**Subject:** Need a few details to turn your product spec into a concrete workflow  

Hi [User],

Thanks for the request! To generate a reliable, testable workflow I need a bit more information from the product spec. Could you please fill in the items below?

| # | Required Detail | Why It Matters |
|---|----------------|----------------|
| 1 | **Primary business goal** (e.g., “process loan applications”, “publish blog posts”) | Drives the overall flow and success criteria. |
| 2 | **Key actors / roles** (e.g., “Customer, Agent, Supervisor”) | Determines who performs each step and where hand‑offs occur. |
| 3 | **Core steps** (ordered list, high‑level) | Forms the backbone of the workflow. |
| 4 | **Inputs & outputs** for each step (data schemas, file types, APIs) | Enables deterministic verification gates. |
| 5 | **Success & failure conditions** per step (e.g., “HTTP 200”, “validation error code”) | Al
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response to the Failure**

| **Event** | **Ricky Action** | **Result** |
|-----------|------------------|------------|
| Step 1 timed‑out | 1️⃣ Mark the step as **failed** in the workflow state store.<br>2️⃣ Emit a **deterministic verification gate** (`gate_timeout`) that records the exact timeout duration, the step ID, and the worker’s heartbeat log. | The failure is persisted and visible to all downstream agents. |
| Worker B stayed alive but produced no artifacts | 1️⃣ Log a **liveness‑without‑output** event (`event_liveness_no_artifact`).<br>2️⃣ Create a **verification gate** (`gate_artifact_missing`) that captures the worker ID, start‑time, end‑time, and a checksum of the empty output directory. | The anomaly is captured as a concrete, testable condition. |

---

## 1. Blocker Reporting & Escalation

1. **Generate a Blocker Ticket** (Ricky‑Blocker‑ID) that includes:
   * Workflow ID, run ID, and commit SHA.
   * Failed step name, timeout value, and actual elapsed time.
   * Worker‑B ID, uptime, and “no‑artifact” flag.
   * Links to the two verification gates (`gate_timeout`, `gate_artifact_missing`).

2. **Post the blocker** to the shared **#workflow‑incide
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Detection for Local Launches**  

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `detect‑stale‑state` |
| **Scope** | Runs **before** any local workflow launch (`wf‑run‑local`) in a developer’s workspace. |
| **Inputs** | - Workspace root (absolute path).<br>- Optional config file `.ricky/config.yml` (may override default stale‑age thresholds). |
| **Outputs** | - `stale‑report.json` (machine‑readable list of stale directories & files).<br>- Console summary (human‑readable). |
| **Success Criteria** | - Returns exit‑code **0** if **no** stale state is found.<br>- Returns exit‑code **1** if stale state is detected **and** the user has elected to abort the launch. |
| **Failure Modes** | - **Blocker**: Missing workspace root or unreadable config → abort launch.<br>- **Warning**: Unexpected file permissions → report but allow continuation if user forces. |
| **Deterministic Verification Gate** | `verify‑stale‑state‑gate` – a unit‑test that runs the detection script against a fixture repo containing known stale artifacts and asserts the exact JSON payload. |
| **Review Artifacts** | - **Design Doc** (`docs/flows/detect-s
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is created by the **`.ricky/run/`** directory (or the hidden `.relay/run/` folder for Relay). It is a safety net that prevents overlapping executions, but it can become stale if a previous run crashes, is killed, or the workspace is reused without a clean‑up.

Below is the deterministic, step‑by‑step repair workflow that you can run locally. Each step is a **verification gate** – if the check fails, stop and report the blocker before proceeding to the next gate.

---

## 1️⃣ Verify the current workspace state

```bash
# Show the active marker (if any)
ricky status   # or: relay status

Expected output: No active run
If you see: Run marker found: <PID> → continue to step 2.


2️⃣ Inspect the marker file

# For Ricky
cat .ricky/run/marker.json   # (or .ricky/run/lock)

# For Relay
cat .relay/run/marker.json

The JSON contains:

{
  "pid": 12345,
  "started_at": "2026-05-10T14:32:01Z",
  "owner": "ricky",
  "workspace": "/path/to/your/repo"
}

Gate 2 – Verify that the PID actually exists:

ps -p <pid>

*If the proces
...[truncated]


**Must**

- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.

**Must Not**

- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.env-loader-injection-runtime-loadable</code> (runtime-recovery/manual)</summary>

**User message**

> A workflow artifact references a `MISSING_ENV_VAR` value. Ricky's deterministic auto-fix injects the `.env.local` / `.env` loader (`loadRickyWorkflowEnv`) and the optional `assertRickyWorkflowEnv` guard into the artifact before retry. The artifact may be a master-rendered workflow whose `.step({ command: ... })` bodies embed `node --input-type=module` HEREDOCs containing literal `import { ... } from 'node:fs'` / `from 'node:path'` strings.

**Ricky output**

```text
**Deterministic auto‑fix for a workflow artifact that references a missing environment variable**

Below is a concrete, repeatable contract you can apply to any workflow artifact (including master‑rendered workflows that embed `node --input-type=module` HEREDOCs). The fix injects the environment‑loader utilities, adds a guard for required vars, and guarantees that the workflow will retry only after the environment is correctly populated.

---

## 1. Preconditions (Verification Gate)

| Check | Expected | Action if not met |
|-------|----------|-------------------|
| Artifact is a **JSON/YAML** workflow definition (or a `.js`/`.ts` file that exports a `workflow` object). | ✅ | Abort – raise a blocker. |
| Artifact contains a reference to `MISSING_ENV_VAR` (string literal or `${process.env.MISSING_ENV_VAR}`). | ✅ | Continue. |
| Project root contains **either** `.env.local` **or** `.env`. | ✅ | Continue. |
| `loadRickyWorkflowEnv` and `assertRickyWorkflowEnv` are **not** already imported. | ✅ | Continue. |
| The workflow uses `step({ command: … })` with HEREDOC strings that start with `node --input-type=module`. | ✅ | Continue. |
| No existing `preRun` hook that already loads env. | 
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1. Preconditions (verify before you start)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| `aw` CLI available | `aw --version` | `aw version x.y.z` |
| Working directory is the workflow repo root | `git rev-parse --show-toplevel` | Path to repo |
| Current branch is a short‑lived feature branch (e.g. `auto‑fix/run‑<timestamp>`) | `git branch --show-current` | `auto-fix/run-...` |
| No uncommitted changes | `git status --porcelain` | *no output* |
| Artifact from previous run is present | `ls artifacts/<prev‑run‑id>/` | list of files (including the one that failed) |
| Failed step identifier is known (e.g. `build:compile`) | – | `FAILED_STEP=build:compile` |
| Previous run ID is known (e.g. `run‑20240530‑01`) | – | `PREV_RUN_ID=run-20240530-01` |

If any check 
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|------------------|
| 1️⃣  | `ricky logs workflows/foo.ts --tail 200` | Recent error stack, failed gate names, and any “blocker” messages. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Full step‑by‑step trace, input‑output of each node, and the exact point where the contract is violated. |
| 3️⃣  | Open the workflow file | Verify that the **contract block** at the top matches the current code (type signatures, required env vars, and output schema). |
| 4️⃣  | Run the failing node in isolation | `ricky run node <node‑id> --no-auto‑fix --debug` – isolates the problem without executing the whole pipeline. |
| 5️⃣  | Check external dependencies | Ensure any services, APIs, or DB connections referenced in the node are reachable (use `ricky ping <service>`). |
| 6️⃣  | Review the **deterministic verification gates** | Look for `gate: { type: "schema", ... }` or `gate: { type: "assert", ... }` that are failing. |
| 7️⃣  | Examine the **review artifact** (if any) | `ricky artifact show <pr‑id>` – may contain a reviewer’s note about a missing field or version mismatch. |
| 8️⃣  | R
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeat‑able contract that Ricky follows whenever a developer asks to run a TypeScript‑based workflow that lives in the *primary local path* (by convention `./workflows`). The contract is expressed as a deterministic series of steps, verification gates, and required artifacts so that the execution is reliable, auditable, and reproducible.

---

## 1. Preconditions (Contract Input)

| Item | Requirement | Source |
|------|-------------|--------|
| **Workflow root** | Must be the repository’s primary local path: `./workflows` (relative to the repo root). | Repo layout |
| **Workflow entry file** | A single TypeScript file that exports a `run()` async function (default export or named `run`). | `package.json` “workflowEntry” field or `./workflows/<name>.ts` |
| **Node version** | `>=18.x` (LTS) and matches the version declared in `.nvmrc` or `engines.node`. | `.nvmrc` |
| **Dependencies** | All runtime dependencies declared in `./workflows/package.json` (or the repo‑wide `package.json` if none). | `package.json` |
| **TypeScript config** | A valid `tsconfig.json` located at `./work
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Handling Procedure for a Persistent Structural Failure**

When a structural failure remains after an initial remediation attempt, Ricky follows a deterministic, contract‑driven workflow that isolates the problem, records every decision, and escalates cleanly. The steps below are the **Ricky Boundary Contract** that must be executed before any further code changes are permitted.

---

## 1. Verification Gate – “Post‑Fix Validation”

| Gate | Input | Success Criteria | Output |
|------|-------|------------------|--------|
| **V‑1** | `git checkout <working‑branch>`<br>`./run‑tests.sh --suite=structural` | All structural tests **pass** **and** the failure‑specific regression test **fails** (i.e., the bug persists). | `V1_PASS = false` → proceed to Boundary Handling. |
| **V‑2** | `git diff <base‑branch>...<working‑branch>` | No unrelated files changed since the fix attempt. | `V2_PASS = true` (otherwise abort and create a “Scope‑Drift” issue). |

If **V‑1** fails (the failure persists) and **V‑2** passes, the boundary is triggered.

---

## 2. Blocker Report Generation (Deterministic Artifact)

Ricky creates a **Blocker Report** (`BLOCKER_<YYYYMMDD>_<SHA>.md`) in th
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Aggregated Run Report**  
*(Generated ⟨date⟩ – covering the last ⟨N⟩ workflow executions)*  

---

## 1. Executive Summary
| Metric | Value |
|--------|-------|
| Total runs processed | **⟨total_runs⟩** |
| Successful runs (≥ 80 % gate pass) | **⟨success_count⟩** (**⟨success_pct⟩ %**) |
| Partial‑success runs (≥ 50 % but < 80 %) | **⟨partial_count⟩** (**⟨partial_pct⟩ %**) |
| Failed runs (gate < 50 %) | **⟨fail_count⟩** (**⟨fail_pct⟩ %**) |
| Runs blocked by hard‑stop errors | **⟨blocker_count⟩** |
| Avg. execution time | **⟨avg_duration⟩** |
| Median latency (from trigger to final gate) | **⟨median_latency⟩** |

> **Health status:** ⟨GREEN/YELLOW/RED⟩ – ⟨short rationale⟩  

---

## 2. Gate‑Level Breakdown (Deterministic Verification Gates)

| Gate | Description | Pass % | Fail % | Avg. time to gate |
|------|-------------|--------|--------|-------------------|
| **Contract Validation** | Schema & contract conformance of inputs/outputs | **⟨gate1_pass⟩ %** | **⟨gate1_fail⟩ %** | **⟨gate1_latency⟩** |
| **Static Analysis** | Lint, type‑check, dependency audit | **⟨gate2_pass⟩ %** | **⟨gate2_fail⟩ %** | **⟨gate2_latency⟩** |
| **Unit Test Suite** | ≥ 
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

### 1. Overview
The **Ricky Slack Debugger** is a dedicated Slack app surface that lets engineers and product owners interact with a running Ricky workflow in real‑time:

| Feature | Description |
|---------|-------------|
| **Live Step Inspection** | Query the current step, inputs, outputs, and state of any workflow instance. |
| **Branch‑Level Tracing** | Walk forward/backward through the DAG, see which branches were taken, and view deterministic gate results. |
| **Gate‑Result Override** | Authorized users can force‑accept or reject a verification gate (with audit trail). |
| **Replay & Replay‑Until** | Re‑run a step or a sub‑graph with optional input mutation; optionally loop until a predicate passes (max N loops). |
| **Log & Artifact Pull** | Pull logs, generated artifacts, and review comments directly into the Slack thread. |
| **Blocker Reporting** | One‑click “Report Blocker” that creates a GitHub issue with full context (workflow ID, step, logs, screenshots). |
| **Metrics Snapshot** | On‑demand KPI view (latency, success‑rate, retry count) for the selected workflow. |

All interactions are **typed commands** (slash‑commands a
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

| Field | Type | Description |
|-------|------|-------------|
| **name** | `string` | Fixed value: `"web_submit_spec"` |
| **payload** | `object` | The body the UI will POST to the AgentWorkforce orchestrator. |
| **payload.spec** | `object` | Full workflow specification (JSON‑Schema‑validated). |
| **payload.mode** | `enum` | `"draft"` | `"activate"` | `"test"` – determines how the orchestrator treats the spec. |
| **payload.metadata** *(optional)* | `object` | Free‑form key/value pairs for audit/logging (e.g., `requestId`, `userId`). |
| **response** | `object` | Standard AgentWorkforce response envelope. |
| **response.status** | `enum` | `"accepted"` | `"rejected"` | `"error"` |
| **response.message** | `string` | Human‑readable status description. |
| **response.runId** *(if accepted)* | `string` | UUID of the created run (used for later polling). |

---

## 1. Contract Definition (JSON‑Schema)

```json
{
  "$id": "urn:awf:handoff:web_submit_spec",
  "type": "object",
  "required": ["name", "payload"],
  "properties": {
    "name": { "const": "web_submit_spec" },
    "payload": {
      "type": "object",
      "required": ["spec", "mode
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision Matrix for the Claude‑to‑Ricky Hand‑off**

| **Input Characteristics** | **Recommended Ricky Action** | **Rationale** |
|----------------------------|------------------------------|---------------|
| **Complete, version‑controlled workflow spec** (YAML/JSON) **+ clear intent** (e.g., “run this nightly ETL”) | **Generate** | The spec is ready for materialization. Create a new branch, scaffold the workflow fil

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant