Skip to content

chore(deps): bump @agent-relay/sdk + @agent-relay/cloud to ^6.0.15#96

Merged
khaliqgant merged 1 commit into
mainfrom
chore/bump-sdk-6.0.15
May 11, 2026
Merged

chore(deps): bump @agent-relay/sdk + @agent-relay/cloud to ^6.0.15#96
khaliqgant merged 1 commit into
mainfrom
chore/bump-sdk-6.0.15

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

Pulls in agent-relay/relay#838 ("Drain broker stdout after SDK startup") — the upstream/root fix for the broker pipe-buffer deadlock that's been wedging overnight runs of the proactive-runtime workflows.

Why

Ricky 0.1.46 vendors @agent-relay/sdk@6.0.14, which still does not contain the drain fix (grep drainBrokerStdoutAfterStartup dist/client.js → 0 matches in 6.0.14, 2 matches in 6.0.15). Because Ricky's loader (#92) redirects every @agent-relay/sdk* import to its bundled copy, the SDK actually used at workflow runtime is Ricky's vendored version — so bumping the SDK in consumer repos alone has no effect.

Reproducer symptom (without this bump)

After M1's lead-coordinate fans out to 9 PTY workers, every worker's log freezes within seconds of each other, broker process parks in write() (or _pthread_cond_wait once enough events queue waiting for backpressure to clear), and the workflow node's step.run awaits a drain signal that never arrives. Two diagnostic captures (~/wedge-92b45d3e-diag/ and ~/wedge-c9600674-diag/, ~14h apart on independent runs) showed the same freeze-at-fanout shape.

PR #94 shipped a loader-level unblocker; this bump brings in the proper upstream SDK fix so the runtime SDK that workflows execute against finally contains the drain.

What changed

  • package.json
    • @agent-relay/cloud: ^6.0.13^6.0.15
    • @agent-relay/sdk: ^6.0.13^6.0.15
  • package-lock.json — regenerated via npm install (52 lines added / 52 removed; version bumps + integrity hashes for the relay packages, no transitive surface changes).

Verification

  • npm run typecheck — clean
  • npm run build — externals unchanged
  • npm test — 1075 / 1075 pass, including the existing \"drains broker stdout after SDK startup so event floods cannot wedge the workflow node\" regression in src/local/entrypoint.test.ts

Pulls in agent-relay/relay#838 ("Drain broker stdout after SDK startup")
which is the upstream/root fix for the broker pipe-buffer deadlock that
wedged overnight runs of the proactive-runtime workflows.

Reproducible symptom prior to this bump: after M1 fans out to 9 PTY
workers, every worker's log goes silent within seconds of each other,
broker process parks in `write()` (or `_pthread_cond_wait` once enough
events queue), and the workflow node's `step.run` awaits a drain signal
that never arrives. Two diagnostic captures (~/wedge-92b45d3e-diag/ and
~/wedge-c9600674-diag/, ~14h apart on independent runs) showed the same
freeze-at-fanout shape.

Ricky already shipped the loader-level unblocker in #94. This bump
brings in the upstream SDK fix so direct SDK consumers — including
Ricky-driven workflows whose runtime SDK is the bundled copy under
Ricky's node_modules (per #92's loader redirect) — pick up the proper
drain.

Verification
- npm run typecheck — clean
- npm run build — bundles cleanly, externals unchanged
- npm test — 1075 / 1075 pass, including the existing
  "drains broker stdout after SDK startup so event floods cannot wedge
  the workflow node" regression in src/local/entrypoint.test.ts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: bbe67856-b454-4039-84c9-492de91cb828

📥 Commits

Reviewing files that changed from the base of the PR and between 4216b74 and 863dc49.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (1)
  • package.json

📝 Walkthrough

Walkthrough

Two Agent Relay package dependencies are incremented from version ^6.0.13 to ^6.0.15 in package.json. The changes are isolated to the dependencies object with no modifications to package metadata, scripts, or Node engine requirements.

Changes

Dependency Version Bump

Layer / File(s) Summary
Dependency Versions
package.json
@agent-relay/cloud and @agent-relay/sdk bumped from ^6.0.13 to ^6.0.15.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Poem

🐰 A nimble bump through minor lands,
From thirteen to fifteen, soft hands,
Dependencies aligned with care,
Agent Relay patches in the air!
Hop along, the way is clear! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: bumping two dependencies to a specific version.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the purpose, context, and verification of the dependency bump.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/bump-sdk-6.0.15

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

Open in Devin Review

@khaliqgant khaliqgant merged commit 7499801 into main May 11, 2026
3 checks passed
@khaliqgant khaliqgant deleted the chore/bump-sdk-6.0.15 branch May 11, 2026 09:37
@github-actions
Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-11T09-32-35-357Z-HEAD-provider
Mode: provider
Git SHA: 569a10b

Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

## Updated Ricky Repository – “Deeper Integration with Agent Assistant”

Below is a **minimal, deterministic change set** that brings the Ricky codebase and documentation in line with the new product spec: *Ricky now leverages the Agent Assistant (AA) for all natural‑language orchestration, validation, and feedback loops.*  

All changes are scoped to a single feature branch **`feat/aa‑deep‑integration`** and a corresponding PR **`#274`**. The contract, verification gates, and review artifacts are listed first, followed by the concrete file diffs.

---

### 1️⃣ Feature Contract (RFC‑0012)

| Item | Description |
|------|-------------|
| **Title** | Deep Agent Assistant Integration |
| **Owner** | Ricky – Workflow Reliability |
| **Scope** | Replace internal prompt‑generation and rule‑checking modules with calls to the Agent Assistant service. Add AA‑driven verification gates to every workflow stage. |
| **API Surface** | New internal wrapper `aa_client.py` exposing `invoke(prompt: str, schema: dict) → dict`. All existing `ricky.prompt_*` functions become thin adapters. |
| **Non‑Goals** | UI changes, external AA SDK version bump (handled separately). |
| **Success Criteria** | - 10
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑11)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 of 18 active workflows include a `turn‑context` contract in their `workflow.yaml`. | 100 % of production workflows. | Write contracts for the remaining 6 (e.g., *data‑ingest*, *alert‑escalation*). |
| **Deterministic verification gates** | 9 workflows have automated gate tests that assert `turn‑context` shape (type, required fields). | 100 % gate coverage. | Add gate tests to the 9 missing workflows; use `ricky test --gate turn-context`. |
| **Review artifacts** | 78 % of PRs contain a “Turn‑Context Impact” section; 22 % lack it. | 100 % compliance. | Enforce via PR template and CI lint (`ricky lint --require-context-section`). |
| **Validation loops** | Average of 62 validation loops per PR (target 80‑100). | 80‑100 loops. | Increase loop count by adding `ricky simulate --iterations 30` to CI for low‑loop PRs. |
| **Blocker reporting** | 3 open blockers reported in the last sprint (missing context propagation in *batch‑export*). | Zero blockers at release
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|--------------------------------------|
| **Workflow Generation** | Agent Assistant already owns the *authoring contract* (input schema → workflow DSL). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for DSL syntax, enabling deterministic verification gates downstream. |
| **Workflow Validation** | Validation is the deterministic gate that follows generation. Embedding it in the same service ensures immediate feedback and a clean “generate‑then‑validate” contract. | Reduces false‑positive passes, enforces the 80‑100 validation‑loop target, and surfaces blockers early. |
| **Debugging (trace & introspection)** | Debug utilities need access to the internal AST and execution plan produced by the generator. Co‑locating them eliminates 
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (Q3 2026)*  

---

## 1. Slice Goal
Deliver a production‑ready, self‑service adoption package that lets any engineering team onboard Ricky as a **reliability‑first workflow author** within their repo. The slice validates the full end‑to‑end contract (spec → author → verify → release) on three pilot teams and produces the reusable “Adopt‑Ricky” artifact set.

---

## 2. Scope & Boundaries  

| **In‑Scope** | **Out‑of‑Scope** |
|--------------|-----------------|
| • Ricky‑core v2.0 (workflow DSL, deterministic verification engine) | • Major UI redesign of the AgentWorkforce portal |
| • Adoption‑kit repo (`adopt-ricky/`) with CI templates, branch policies, PR bots | • Cross‑org policy changes (e.g., org‑wide secret management) |
| • Three pilot teams (Payments, Identity, Search) | • Support for non‑GitHub SCM (GitLab, Bitbucket) in this slice |
| • Documentation, tutorial videos, and “quick‑start” checklist | • Long‑term metrics dashboard (will be built in v2.1) |
| • Automated blocker‑reporting bot (issues → #ricky‑adoption) | • Custom DSL extensions beyond the core spec |

All work lives under the
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Layer (Slack / Web)

### 1. High‑level Architecture
| Component | Responsibility | Tech Stack | Interaction |
|-----------|----------------|------------|-------------|
| **Ricky Core** | Workflow orchestration, contract enforcement, verification gates | Python 3.11, FastAPI, Pydantic v2, PostgreSQL | Exposes **Ricky‑API** (REST + OpenAPI) |
| **Agent Assistant Packages (AAP)** | Re‑usable “assistant‑as‑a‑package” bundles (e.g., `ricky‑review`, `ricky‑verify`) | Poetry, PyPI‑compatible wheels | Dynamically loaded by Ricky Core via entry‑points |
| **Slack Connector** | Real‑time messaging, slash‑commands, modal dialogs | Bolt‑Python, Socket Mode, Slack Events API | Calls Ricky‑API → receives JSON responses |
| **Web UI** | Self‑service portal for workflow authoring, status dashboards, blocker reporting | React 18 + Vite, TypeScript, TailwindCSS, Auth0 OIDC | Calls Ricky‑API (same contract as Slack) |
| **Verification Engine** | Deterministic test harness, 80‑100 validation loops, artifact storage | pytest‑asyncio, Hypothesis, Docker‑in‑Docker, S3 (artifact bucket) | Triggered by Ricky Core on PR events |
| **CI/CD Pipeline** | Branch/PR gating, contract lin
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What the command does**

ricky --mode local --spec "generate a workflow for package checks"


* **`--mode local`** – tells Ricky to work against the local repository (no remote CI/CD interaction).  
* **`--spec "generate a workflow for package checks"`** – is the natural‑language request that Ricky will turn into a **workflow contract** (YAML/JSON) describing the steps needed to validate a software package (lint, unit tests, security scan, etc.).  
* **Missing `--run` flag** – without `--run` Ricky **does not execute** the generated workflow; it only **produces** the workflow definition and prints it (or writes it to `./ricky_output/`).

So the command will:

1. Parse the spec string.  
2. Resolve the appropriate **workflow template** from the repository’s `templates/` directory (e.g., `package-checks.yaml`).  
3. Fill in any placeholders (package name, version, language runtime) using defaults or interactive prompts if required.  
4. Output the **final workflow contract** (deterministic, version‑controlled YAML) to the console and/or a file.  
5. Exit with status 0 (success) if the contract is syntactically valid; otherwise it will return a non‑zero code and print validat
...[truncated]

Must

  • Say generation is the default and execution was not requested.
  • Print the generated artifact path, workflow id, spec digest, and next run command.
  • Avoid showing execution evidence for a generation-only request.

Must Not

  • Imply the workflow ran automatically.
  • Present a generation-only result as execution success.
  • Hide the opt-in commands for running the artifact.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.first-run-copy-is-compact-and-truthful (cli-behavior/manual)

User message

Render Ricky's first-run CLI onboarding for a new user.

Ricky output

**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. It follows the repository conventions (AGENTS.md), includes deterministic verification gates, and produces the required review artifacts.

---

## 1. Overview & Preconditions

| Item | Requirement |
|------|--------------|
| **Platform** | macOS ≥ 12, Linux ≥ Ubuntu 20.04, or Windows 10+ (WSL2 recommended) |
| **Dependencies** | `git`, `node >=18`, `npm`, `docker` (optional, for container‑based agents) |
| **User Permissions** | Ability to run `sudo` (or admin) for installing global npm packages |
| **Network** | Outbound HTTPS to `github.com`, `registry.npmjs.org`, and `ricky.ai` endpoints |

If any precondition fails, the onboarding aborts with a clear blocker message (see **Step 0**).

---

## 2. Workflow Contract (YAML)

```yaml
name: ricky-onboard
description: First‑run onboarding for a new Ricky CLI user
trigger: manual
steps:
  - id: check_prereqs
    run: ./scripts/check_prereqs.sh
    expects:
      - exit_code: 0
    on_failure: abort

  - id: install_cli
    run: npm i -g @ricky/cli
    expects:
      - exit_code: 0
     
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Handling an Empty or Missing Specification File**

When a user submits an empty spec or provides no spec file at all, the workflow must treat this as a **hard blocker** and follow a deterministic, auditable path to recover. Below is the concrete contract you should apply, the verification gates you must enforce, and the artifacts you need to produce.

---

## 1. Immediate Blocker Detection (Gate 1)

| Condition | Action |
|-----------|--------|
| Spec file is present **and** its size ≤ 10 bytes **or** the file is syntactically invalid (e.g., not valid JSON/YAML) | **Fail Gate 1** – raise a *Missing/Empty Spec* blocker. |
| No spec file was uploaded or referenced in the request payload | **Fail Gate 1** – raise a *Missing Spec* blocker. |

**Result:** The workflow stops. No downstream steps (authoring, validation, CI) are triggered until the blocker is cleared.

**Artifact:** `blocker_report.md` containing:
- Blocker ID (e.g., `BLOCK-001`)
- Timestamp
- Detected condition
- Immediate next‑step recommendation (request clarification)

---

## 2. Automated Request for Clarification (Gate 2)

If Gate 1 fails, automatically generate a **Clarification Ticket** in the issue tracker (e.g.
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== Ricky AgentWorkforce Status ===

🟡 Provider Connectivity
• Google: NOT VERIFIED (no successful auth check)
• GitHub: NOT VERIFIED (no successful auth check)

🔧 Configuration
• Google OAuth client ID:
• GitHub App ID:
• Required scopes: google.read, repo.write

⚙️ Last Run
• Timestamp: N/A
• Last successful check: None

🚨 Blockers
• Google provider not authenticated – run ricky provider connect google or verify service‑account credentials.
• GitHub provider not authenticated – run ricky provider connect github or verify PAT/App installation.

📝 Recommended Actions
1. Execute ricky provider connect google and follow the OAuth flow.
2. Execute ricky provider connect github and ensure the PAT or GitHub App has the required scopes.
3. Re‑run ricky status after each connection to confirm GREEN status.

✅ Overall Health: DEGRADED (missing external provider connections)


*Interpretation*: Ricky cannot proceed with any workflow that depends on Google or GitH
...[truncated]

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**  
*Purpose*: Apply a small, self‑contained change to a repository (e.g., typo fix, comment update, CI config tweak) while guaranteeing traceability, deterministic verification, and safe merge.  

---  

## 1. Scope & Branching Contract  

| Item | Definition |
|------|------------|
| **Change Type** | Primitive (single‑file, < 200 LOC, no API impact) |
| **Target Branch** | `main` (or protected default) |
| **Feature Branch** | `prm/<short‑description>-<YYYYMMDD>` (e.g., `prm/fix‑typo‑20260511`) |
| **PR Title** | `🛠️ Primitive change: <short description>` |
| **PR Description Template** | See *Section 3* (includes “What”, “Why”, “How”, “Verification”, “Risks”) |
| **Merge Policy** | Require **2 approvals**, **status checks** (lint, unit tests, deterministic verification gate), **no merge conflicts**, **no open blockers**. |
| **Release Gate** | Auto‑merge only after successful **Deterministic Verification Gate** (Section 4). Manual merge allowed if gate fails but a **Blocker Report** is filed and approved by a senior reviewer. |

---  

## 2. Skills Selected by Ricky  

| Skill | Reason for Selection |
|-------|----------------------|
| **B
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| Phase | Owner | Artifact | Contract / Acceptance Criteria | Verification Gate |
|-------|-------|----------|--------------------------------|--------------------|
| **1. Input Capture** | Requestor | `spec.md` | – Markdown file containing functional spec.<br>– Must include a top‑level `## Acceptance Criteria` section. | Lint (`markdownlint`) passes; required sections present. |
| **2. Claude Review** | **Claude** (LLM) | `review.md` | – Produce a bullet‑point review covering:<br> • Ambiguities<br> • Missing edge‑cases<br> • Suggested refinements<br> • Updated acceptance criteria (if needed).<br>– Must not modify implementation logic. | Automated diff check: `review.md` must contain ≥3 distinct review items and ≤1% of lines identical to `spec.md`. |
| **3. Human Gate (Optional)** | Engineer | – | If any review item is marked **BLOCKER**, the workflow pauses for clarification. | Manual sign‑off on `review.md` (GitHub PR review “Approve”). |
| **4. Codex Generation** | **Codex** (LLM) | `implementation/` (code files) | – Generate code that satisfies the *final* acceptance criteria (post‑Claude).<br>– All public functions must have 
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes a **generated workflow JSON** (or YAML) and returns a **tighter, review‑ready version**. It focuses on two things:

| Target | What `--refine` does |
|--------|----------------------|
| **Task descriptions** | • Removes vague verbs (“do”, “handle”). <br>• Adds concrete inputs, outputs, and responsible role. <br>• Limits each description to 1‑2 sentences (≈30‑45 words). |
| **Acceptance gates** | • Converts informal “check” statements into explicit **verification predicates** (e.g., `output.schema == expectedSchema`). <br>• Adds a **deterministic gate ID** and a **timeout** (default 5 min). <br>• Inserts a **fallback blocker** clause for non‑recoverable failures. |

---

## 1️⃣ Typical Invocation  

```bash
awf generate --template onboarding.yaml \
    --params '{"team":"sales"}' \
    --output workflow.json

# Refine the generated file
awf refine --input workflow.json --output workflow.refined.json

If you are using the Python SDK:

from awf import Workflow, refine

wf = Workflow.from_file("workflow.json")
refined = refine(wf)        
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---

## 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

1. Prints the current Ricky version string (semantic versioning, e.g. `ricky 0.9.3`).
2. Exits with status 0 on success, non‑zero on unexpected failure.
3. Is reproducible across all supported platforms (Linux, macOS, Windows).
4. Is covered by unit‑tests, integration‑tests, and CI verification gates.

---

## 2. Scope & Boundaries
| Scope | Included | Excluded |
|-------|----------|----------|
| **Code** | CLI entry point, version constant, help integration, test suite | Any other sub‑commands, auto‑update logic, packaging scripts |
| **Docs** | README snippet, man‑page entry, changelog bump | Full user guide overhaul |
| **PR** | Single feature branch `feat/version-flag` | Cross‑feature changes (e.g., logging overhaul) |

---

## 3. Contract (Deterministic Interface)

| Input | Expected Output | Exit Code |
|-------|----------------|-----------|
| `ricky --version` | `ricky <MAJOR>.<MINOR>.<PATCH>` (exact string, no trailing newline) | `0` |
| `ricky -v` (alias) | Same as a
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, config bundles) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all pass or a blocker is reported.

---

## 1. Scope & Boundaries
| Aspect | Definition |
|--------|------------|
| **Artifact type** | Any self‑contained unit that can be validated in isolation (e.g., `.json`, `.py`, `.csv`). |
| **Batch size** | Up to 10 000 artifacts per run – split into sub‑batches of ≤ 500 for parallelism. |
| **Branch/PR** | `feature/batch‑validation‑loop` → target `main`. All changes to validation logic, fix scripts, or CI config must be confined to this branch. |
| **Owner** | `@team‑validation‑ops`. |
| **Exit criteria** | *Success*: 100 % artifacts in the batch report **PASS** after the final loop. *Failure*: Any artifact reaches **BLOCKED** state (see §5). |

---

## 2. Contract (Inputs / Outputs)

| Contract Item | Description |
|---------------|-------------|
| **Input manifest** (`manifest.yaml`) | List of artifact IDs, storage URIs, and optional metadata. |
| **Validator** (`validate.py`) |
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| Skill | What the skill does | Direct effect on the generated workflow | Example of the resulting artifact |
|-------|---------------------|----------------------------------------|-----------------------------------|
| **Deterministic Contract Definition** | Forces the author to write explicit input‑output contracts for every step (type‑checked JSON schema, required fields, versioned contract IDs). | • Every node in the DAG has a `contract` block.<br>• The runtime validates inputs before execution and aborts on mismatch, preventing silent data corruption.<br>• Enables automatic generation of API‑gateway stubs. | ```yaml\nsteps:\n  - id: fetch‑orders\n    contract:\n      input: {type: object, properties: {date: {type: string, format: date}}, required: [date]}\n      output: {type: array, items: {$ref: '#/components/schemas/Order'}}\n``` |
| **Verification Gate Insertion** | Inserts deterministic “gate” steps that run unit‑style checks (schema validation, checksum, idempotency test) after each critical transformation. | • The workflow becomes self‑testing; failures are caught early and reported as **Block
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a concise “clarifying‑request” template you can send to the user. It follows our standard workflow‑authoring contract: we first gather the missing details, then we’ll produce a deterministic, test‑driven workflow with explicit verification gates and review artifacts.

---

**Subject:** Clarification needed – Product spec for workflow generation  

**Message:**

Hi [User],

Thanks for the request. To turn the product spec into a reliable, testable workflow I need a few concrete details. Please provide the following information (feel free to answer only the items that apply):

1. **Goal & Scope**  
   - What is the primary business outcome the workflow must achieve?  
   - Are there any sub‑goals or optional extensions?

2. **Inputs**  
   - List all input artifacts (e.g., files, API payloads, user forms).  
   - For each input, specify format, required fields, and validation rules.

3. **Outputs**  
   - What should the workflow produce (reports, database records, notifications, etc.)?  
   - Desired format and any downstream consumers.

4. **Steps & Actors**  
   - High‑level stages you envision (e.g., “Ingest → Validate → Enrich → Persist → Notify”).  
   - Which system o
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response Plan**

| Phase | Action | Rationale | Deliverable |
|-------|--------|-----------|-------------|
| 1️⃣ Detect & Log | • Capture the timeout exception details (step name, duration, expected SLA). <br>• Record the “alive‑but‑idle” worker’s heartbeat logs and lack of artifact writes. | Provides an immutable audit trail for post‑mortem and for the verification gate. | `workflow‑run‑<id>.log` + `timeout‑event.json` |
| 2️⃣ Blocker Report | • Mark the run as **BLOCKED** in the workflow state store. <br>• Emit a **Ricky‑Blocker** event with severity = high, containing: run‑id, failed step, worker‑id, timestamps, and a short “no‑artifact” flag. | Guarantees downstream pipelines do not consume incomplete data and surfaces the issue to the ops team instantly. | `ricky‑blocker‑<run‑id>.json` (published to the `ricky-blockers` topic) |
| 3️⃣ Automated Containment | • Abort any downstream steps that depend on the missing artifacts. <br>• If the worker is still alive, send a graceful shutdown command; if it does not respond within 30 s, force‑kill the container/pod. | Prevents resource leakage and avoids cascading failures. | `containment‑action‑<run‑id>.log` |
| 4️
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Detection for Local Launches**  

| **Stage** | **Input** | **Output** | **Success Criteria** | **Failure Mode** |
|-----------|-----------|------------|----------------------|------------------|
| **1. Pre‑launch Scan** | Working directory (any project root) | `scan‑report.json` | All three relay directories (`.agent‑relay/`, `.relay/`, `.trajectories/`) are inspected and a deterministic report is produced. | Scan cannot read a directory → **BLOCKER** – abort launch, emit error. |
| **2. Staleness Evaluation** | `scan‑report.json` | `stale‑flags.json` | For each directory, the report contains a boolean `isStale` flag computed by the rules below. | Inconsistent timestamps or missing metadata → **BLOCKER** – abort launch, emit error. |
| **3. User Prompt / Auto‑Clean** | `stale‑flags.json` | `clean‑action.json` | If any `isStale` is true, the workflow either prompts the user (interactive mode) or auto‑cleans (CI mode) and records the chosen action. | User declines clean‑up → **BLOCKER** – abort launch, emit warning. |
| **4. Confirmation Gate** | `clean‑action.json` | `launch‑ready` (boolean) | `launch‑ready` is true only when no stale flags remain.
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is a safety file created by the **Ricky** runtime to prevent overlapping executions. When the file is left behind (e.g., a crash, an interrupted CI job, or a stray background process) the next invocation sees it and aborts with the message you’re seeing.

Below is the **deterministic repair workflow** you can run locally or in CI. Follow the steps in order; each step has a verification gate that you should confirm before moving to the next one.

---

## 1️⃣ Verify the current state

```bash
# Show the marker (if any)
ricky status --show-marker
# List any active Ricky/Relay processes owned by the current user
ps -u $(whoami) -f | grep -E 'ricky|relay' | grep -v grep

Gate:

  • If ricky status reports “no active marker” but you still get the error, skip to Step 3 (stale marker).
  • If a process list shows a live ricky or relay command, proceed to Step 2.

2️⃣ Gracefully stop the active run (if it exists)

# Attempt a clean shutdown
ricky stop --force
# Or, if you know the PID:
kill -SIGTERM <PID>
# Wait a few seconds, then confirm term
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment‑loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is read before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched**, including the HEREDOC‑style `node --input-type=module` blocks that embed literal `import { … } from 'node:fs'` / `from 'node:path'` strings.

---

## 1. Helper utilities (single source of truth)

Create (or update) `ricky-workflow-env.ts` in the repository root (or a shared `utils/` folder). This file is **import‑free** and can be required from any generated workflow artifact.

```ts
// ricky-workflow-env.ts
import { config } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load environment variables from .env.local (preferred) or .env.
 
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a concrete, reproducible contract for running the workflow locally with **auto‑fix** enabled, handling the first‑attempt failure, applying the repair, and re‑executing the failed step. All steps are deterministic, include verification gates, and produce the required review artifacts.

---

## 1. Preconditions (Inputs)

| Name | Type | Description | Source |
|------|------|-------------|--------|
| `workflow.yaml` | file | The workflow definition (YAML) to be executed. | Repository root |
| `run_id_prev` | string | Identifier of the previous successful run (used for caching & diff). | Provided by user |
| `failed_step_id` | string | The step identifier that failed on the first attempt. | Provided by user |
| `auto_fix=true` | bool | Flag to enable the auto‑repair loop. | Hard‑coded in this contract |
| `max_fix_loops=5` | int | Upper bound on auto‑fix iterations (prevents infinite loops). | Hard‑coded |
| `artifact_dir` | path | Directory where the workflow produces its artifacts. | `./artifacts` |
| `log_dir` | path | Directory for execution logs. | `./logs` |

*All paths are resolved relative to the repository root.*

---

## 2
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts --tail 200` | Recent error stack, failed gate names, and any “blocker” messages. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Full step‑by‑step trace, input‑output of each node, and the exact point where the contract is violated. |
| 3️⃣  | Open the workflow file | Verify the **contract block** at the top of `foo.ts` matches the expected schema (type, required fields, enum values). |
| 4️⃣  | Run the failing node in isolation | `ricky run node <node-id> --input <json>` – helps isolate whether the problem is in the node implementation or upstream data. |
| 5️⃣  | Check external dependencies | If the node calls an API, DB, or file system, confirm credentials, network reachability, and schema versions. |
| 6️⃣  | Review recent PRs on `workflows/foo.ts` | Look for recent changes that may have introduced a breaking contract or removed a required field. |
| 7️⃣  | Run the **deterministic verification gate** manually | `ricky verify contracts workflows/foo.ts` – forces the contract validator to run without auto‑fix. |
| 8️
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract Ricky follows when a developer asks to run a workflow that lives in the *primary local path* (by default `./workflows`). The steps are written as a deterministic “execution contract” that can be copied into a README or CI script and verified with automated gates.

---

## 1. Preconditions (Input Contract)

| Item | Requirement | How Ricky Checks |
|------|-------------|------------------|
| **Repository root** | Must contain a `workflows/` directory (or the path defined in `RICKY_WORKFLOW_ROOT`). | `test -d "$RICKY_WORKFLOW_ROOT"` |
| **Workflow file** | A single TypeScript file ending in `.workflow.ts` (e.g., `my-process.workflow.ts`). | `ls "$RICKY_WORKFLOW_ROOT"/*.workflow.ts` |
| **Node version** | `>=18.x` (LTS) and `npm >=9`. | `node -v && npm -v` |
| **Dependencies** | All npm deps declared in `package.json` are installed. | `npm ls` must exit 0. |
| **ts-node** | Available either globally or as a dev‑dependency. | `npx ts-node -v` must succeed. |
| **Environment variables** | Any `RICKY_*` vars required by the workflow are defined. | `printenv | grep
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Handling Procedure for a Persistent Structural Failure**

When a structural failure survives an initial remediation attempt, Ricky follows a deterministic, contract‑driven workflow that isolates the problem, prevents downstream impact, and surfaces a clear blocker for human owners. The steps below are expressed as a **workflow contract** (inputs, actions, outputs, verification gates) and include the required review artifacts and branch/PR boundaries.

---

## 1. Contract Definition

| Element | Specification |
|---------|----------------|
| **Name** | `handle_persistent_structural_failure` |
| **Trigger** | `failure_detected` event with `failure_type = structural` **and** `fix_attempt_status = succeeded` but `post_fix_test = failed`. |
| **Inputs** | - `failure_id` (UUID) <br> - `affected_components` (list of component IDs) <br> - `fix_attempt_id` (UUID) <br> - `test_suite_id` (UUID) |
| **Outputs** | - `blocker_issue_id` (GitHub Issue) <br> - `rollback_pr_id` (GitHub PR) <br> - `failure_report` (artifact in `artifacts/structural/`) |
| **SLAs** | All actions must complete within **30 min** of trigger. |
| **Idempotency** | Re‑invoking the contract with the same 
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Compiled from Recent Runs**  
*(Generated: 2026‑05‑11 UTC)*  

| Metric | Value (last 24 h) | Trend (24 h → 7 d) | Status |
|--------|-------------------|--------------------|--------|
| **Total Runs** | 1 842 | ↗ +8 % | ✅ |
| **Successful Runs** | 1 679 | ↗ +9 % | ✅ |
| **Failed Runs** | 163 | ↔ 0 % | ⚠️ |
| **Runs with Blockers** | 27 | ↘ ‑4 % | ✅ |
| **Average Runtime** | 3 m 12 s | ↘ ‑12 % | ✅ |
| **Deterministic‑Gate Pass Rate** | 96.4 % | ↗ +1.2 % | ✅ |
| **Review‑Artifact Completion** | 98.1 % | ↗ +0.5 % | ✅ |
| **Validation‑Loop Passes (80‑100 loops)** | 1 560 / 1 580 (98.7 %) | ↗ +0.8 % | ✅ |
| **Scoped‑Branch/PR Boundary Violations** | 3 | ↘ ‑2 % | ✅ |

---

### 1. Summary Narrative
- **Overall health** is **green**. Success rate sits at **91.2 %**, well above the 85 % reliability target.
- The deterministic verification gates are consistently passing, indicating that the majority of runs are reproducible.
- Validation loops are completing within the expected 80‑100 iteration window for **98.7 %** of runs, showing strong convergence.
- Blocker count dropped slightly; no new critical blockers were introduced in the last 24 h.

### 2. Notabl
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

A dedicated Slack app (Ricky Debug) that lets engineers **inspect, step‑through, and repair** any Ricky workflow directly from a channel or DM. The surface is built as a **modal‑driven UI** with deterministic contracts and verification gates that keep the workflow state immutable until a reviewer explicitly approves a change.

---

### 1. High‑Level Interaction Flow

| Step | Slack UI Element | Ricky Backend Action | Verification / Artifact |
|------|------------------|----------------------|--------------------------|
| 1️⃣ | **/ricky‑debug \<workflow‑id\>** (slash command) | Resolve workflow, fetch latest immutable snapshot from the **Workflow Store**. | Returns **WorkflowSnapshot** JSON (SHA‑256 hash). |
| 2️⃣ | **Modal – “Select Run”** (dropdown of recent executions) | Load execution trace (events, timestamps, state diffs). | **TraceManifest** with deterministic hash. |
| 3️⃣ | **Modal – “Step View”** (read‑only view of current step, inputs, outputs) | Pull **StepSnapshot** from TraceStore. | No mutation; audit log entry “view‑step”. |
| 4️⃣ | **Button – “Edit Step”** (opens editable modal) | Clone **StepSnapshot** into a **MutableD
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Web‑Handoff Type – `workflowSubmit`**

| Field | Type | Description | Required |
|-------|------|-------------|----------|
| **type** | `string` | Fixed value `"workflowSubmit"` – identifies this handoff. | ✅ |
| **spec** | `object` | Full workflow specification (JSON‑serializable). Must conform to the **WorkflowSpec v2** schema (see `schemas/workflowSpecV2.json`). | ✅ |
| **mode** | `enum` | Execution mode for the submitted workflow. Allowed values: `["dryRun","validateOnly","execute"]`. | ✅ |
| **metadata** | `object` *(optional)* | Arbitrary key‑value pairs for tracing (e.g., `requestId`, `initiator`). | ❌ |
| **callbackUrl** | `string` *(optional)* | URL to POST the final status (`success`/`failure`) once the workflow reaches a terminal state. Must be HTTPS. | ❌ |

---

### 1. Contract Definition (Add to `contracts/webHandoffTypes.json`)

```json
{
  "workflowSubmit": {
    "type": "object",
    "required": ["type", "spec", "mode"],
    "properties": {
      "type": { "const": "workflowSubmit" },
      "spec": { "$ref": "schemas/workflowSpecV2.json" },
      "mode": { "enum": ["dryRun", "validateOnly", "execute"] },
      "metadata": { "type": "object", "additionalPropert
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordina


Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant