Skip to content

sjawhar-legion-126: skill(test.md) — evidence gate, handoff enforcement#130

Merged
sjawhar merged 1 commit intomainfrom
sjawhar-legion-126
Mar 27, 2026
Merged

sjawhar-legion-126: skill(test.md) — evidence gate, handoff enforcement#130
sjawhar merged 1 commit intomainfrom
sjawhar-legion-126

Conversation

@sjawhar
Copy link
Copy Markdown
Owner

@sjawhar sjawhar commented Mar 27, 2026

Closes #126

Summary

Strengthens the test workflow in two areas:

  1. Evidence Gate (Change 7) — Elevates evidence requirements from guidance to a hard gate in Section 5. Every acceptance criterion must now include a concrete artifact (screenshot, command output, log, or code-ref). Local file paths are explicitly prohibited. PR template placeholders updated from [output/screenshot] to [REQUIRED: screenshot/output/log/code-ref]. New Common Mistakes entry added.

  2. Handoff Enforcement (Change 8) — Adds MUST-attempt language for handoff writes in Section 6.5, acknowledging the partially broken plumbing (Planner handoff data doesn't reach tester or reviewer directly #124). Pre-signal checklist added to Section 7 requiring PR results posted and handoff write attempted before adding labels.

Both .opencode/ and .claude/ copies are identical (hard-linked files, same inode).

@sjawhar
Copy link
Copy Markdown
Owner Author

sjawhar commented Mar 27, 2026

Behavioral Test Results

Summary

PASS — 12/12 acceptance criteria verified

Results

Change 7: Evidence Gate Strengthening

Criterion Status Evidence
Bold evidence gate callout in Section 5 grep -n 'EVIDENCE GATE' → line 169, within Section 5 (lines 154–182). Blockquote format with bold label.
PR template marks evidence as REQUIRED grep -c 'REQUIRED: screenshot/output/log/code-ref'2 (GitHub template line 197, Linear template line 243)
Allows "Verified by code inspection: [file:line]" Present in EVIDENCE GATE callout: "Verified by code inspection: [file:line]" at line 169
Allows one artifact for multiple related criteria Present in EVIDENCE GATE callout: "One artifact may cover multiple related criteria if explicitly noted." at line 169
New Common Mistakes entry grep -n 'Posting results without evidence' → line 332, last row of Common Mistakes table
Explicitly prohibits local file paths as evidence grep -c 'Local file paths'2 (EVIDENCE GATE callout line 169 + Common Mistakes entry line 332)

Change 8: Test Handoff Enforcement

Criterion Status Evidence
Section 6.5 uses "MUST attempt" language grep -n 'MUST.*attempt' → line 229 in Section 6.5 (starts line 213): You **MUST** attempt the handoff write
Section 7 has pre-signal checklist (steps 6 & 6.5) grep -A5 'Pre-signal checklist' → line 265 in Section 7 (starts line 259), references "step 6" and "step 6.5"
|| true preserved grep -n '|| true' → line 218 (code block) and line 229 (explanation text). Not removed or altered.
Distinction clear: attempt mandatory, success non-blocking Line 229: "MUST attempt" + "
Acknowledges handoff plumbing not fully operational grep -n '#124' → line 229: "handoff plumbing between phases may not be fully operational yet (#124)"

Dual Directory Mirroring

Criterion Status Evidence
Both directories identical diff returns no output. ls -li confirms same inode (90987533) — files are hard-linked.

Test Quality Critique

N/A — markdown-only changes, no code or tests to critique.

Documentation Feedback

Observations

  • Plan deviation (minor): Plan specified **MANDATORY ATTEMPT:** as a bold label prefix (matching the **EVIDENCE GATE:** pattern); implementer used inline bold **MUST** instead. The acceptance criteria specify "MUST attempt language" not the specific label, so this passes. A consistent label style (**MANDATORY ATTEMPT:**) would improve scannability.
  • Plan deviation (minor): Plan included expanded "What does NOT count as evidence" bullet section (3 bullet points); implementer inlined the prohibition into the blockquote and Common Mistakes entry. More concise, still meets all acceptance criteria.
  • Plan deviation (minor): Pre-signal checklist simplified from plan (no ✅ checkmarks, no explanatory parentheticals). Still references steps 6 and 6.5 as required.
  • Hard-link approach: Both .opencode/ and .claude/ share inode 90987533. Only .opencode/ appears in the PR diff, which is correct — editing one file updates both. This is sound but worth noting for reviewers unfamiliar with the setup.

Add EVIDENCE GATE callout to Section 5 requiring concrete artifacts per
acceptance criterion — screenshots, command output, logs, or code-ref.
Explicitly prohibits local file paths as evidence. Update PR template
placeholders to mark evidence as REQUIRED. Add Common Mistakes entry.

Add MUST-attempt language for handoff writes in Section 6.5, acknowledging
partially broken plumbing (#124). Add pre-signal checklist to Section 7
requiring both PR results and handoff write before label changes.

Closes #126
@sjawhar
Copy link
Copy Markdown
Owner Author

sjawhar commented Mar 27, 2026

Review Summary

CRITICAL (P1): 0 issues
IMPORTANT (P2): 0 issues
MINOR (P3): 3 suggestions

Verdict: Approved to merge — All 11 acceptance criteria met, CI green (lint/test/typecheck), dual directory mirroring verified (hard-linked inode 90987533).


Acceptance Criteria Verification

Change 7: Evidence Gate Strengthening

Criterion Status
Bold evidence gate callout in Section 5 ✅ Blockquote with bold label, correctly placed after evidence types list
PR template marks evidence as REQUIRED ✅ Both GitHub and Linear templates updated
Allows "Verified by code inspection: [file:line]" ✅ Present in EVIDENCE GATE text
Allows one artifact for multiple related criteria ✅ Present in EVIDENCE GATE text
New Common Mistakes entry ✅ Last row of Common Mistakes table
Explicitly prohibits local file paths ✅ In EVIDENCE GATE + Common Mistakes entry

Change 8: Test Handoff Enforcement

Criterion Status
Section 6.5 uses "MUST attempt" language You **MUST** attempt the handoff write
Section 7 has pre-signal checklist (steps 6 & 6.5) ✅ Both steps referenced
`
Distinction clear: attempt mandatory, success non-blocking ✅ Clear language separating obligation from error handling
Acknowledges handoff plumbing not fully operational ✅ References #124

Dual Directory Mirroring

Criterion Status
Both directories identical ✅ Hard-linked (same inode), verified with diff

CI Status

All checks passing: lint ✅ | test ✅ | typecheck ✅

Minor Suggestions (P3)

See inline comments.

Copy link
Copy Markdown
Owner Author

@sjawhar sjawhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues. All acceptance criteria met. Three minor style suggestions inline.

- Command output (for CLI/API tests)
- Log excerpts (for backend behavior)

> **EVIDENCE GATE:** Every acceptance criterion in your PR comment MUST include at least one concrete artifact: screenshot, command output, log excerpt, or — for non-behavioral criteria verifiable only by reading code — "Verified by code inspection: [file:line]". A test result without evidence is not a valid test. One artifact may cover multiple related criteria if explicitly noted. Local file paths are NOT evidence — the reviewer cannot access your filesystem.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion (P3)

The plan specified a separate "What does NOT count as evidence" section with bullet points and an example path (/tmp/screenshot.png). The inline approach is more concise, but the expanded format with concrete anti-examples is more scannable for testers who need to quickly self-check. Not blocking — the acceptance criteria are met either way.

HANDOFF
```

You **MUST** attempt the handoff write before signaling completion. The `|| true` ensures CLI failures don't block you, but skipping this step entirely is not acceptable. If the write fails, note it in your PR comment. Note: handoff plumbing between phases may not be fully operational yet (#124) — the attempt is what matters, establishing the habit so that when the plumbing is fixed, data flows automatically.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion (P3)

Style inconsistency: Section 5 uses a labeled blockquote callout (> **EVIDENCE GATE:**) but this section uses plain paragraph with inline bold (You **MUST** attempt). The plan specified **MANDATORY ATTEMPT:** as a labeled prefix to match the EVIDENCE GATE pattern. Using a consistent callout style (e.g., > **MANDATORY ATTEMPT:** You MUST attempt...) would improve document scannability. Not blocking.

@@ -257,6 +261,13 @@ linear_linear(action="comment", id=$LEGION_ISSUE_ID, body="## Behavioral Test Re
**CRITICAL: The labels are how the controller knows you finished.** If you skip this,
the issue silently stalls. This is the MOST IMPORTANT step.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😹 Nitpick (P3)

Double blank line before the pre-signal checklist (lines 262-263). Single blank line is the standard markdown separator.

@sjawhar sjawhar marked this pull request as ready for review March 27, 2026 05:52
@sjawhar sjawhar force-pushed the sjawhar-legion-126 branch from 9762c46 to ff37e85 Compare March 27, 2026 05:52
@sjawhar sjawhar merged commit d22c395 into main Mar 27, 2026
3 checks passed
@sjawhar sjawhar deleted the sjawhar-legion-126 branch March 27, 2026 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skill audit: test.md — evidence gate, handoff enforcement

1 participant