sjawhar-legion-126: skill(test.md) — evidence gate, handoff enforcement#130
sjawhar-legion-126: skill(test.md) — evidence gate, handoff enforcement#130
Conversation
Behavioral Test ResultsSummaryPASS — 12/12 acceptance criteria verified ResultsChange 7: Evidence Gate Strengthening
Change 8: Test Handoff Enforcement
Dual Directory Mirroring
Test Quality CritiqueN/A — markdown-only changes, no code or tests to critique. Documentation Feedback
Observations
|
Add EVIDENCE GATE callout to Section 5 requiring concrete artifacts per acceptance criterion — screenshots, command output, logs, or code-ref. Explicitly prohibits local file paths as evidence. Update PR template placeholders to mark evidence as REQUIRED. Add Common Mistakes entry. Add MUST-attempt language for handoff writes in Section 6.5, acknowledging partially broken plumbing (#124). Add pre-signal checklist to Section 7 requiring both PR results and handoff write before label changes. Closes #126
Review SummaryCRITICAL (P1): 0 issues Verdict: Approved to merge — All 11 acceptance criteria met, CI green (lint/test/typecheck), dual directory mirroring verified (hard-linked inode 90987533). Acceptance Criteria VerificationChange 7: Evidence Gate Strengthening
Change 8: Test Handoff Enforcement
Dual Directory Mirroring
CI StatusAll checks passing: lint ✅ | test ✅ | typecheck ✅ Minor Suggestions (P3)See inline comments. |
sjawhar
left a comment
There was a problem hiding this comment.
No blocking issues. All acceptance criteria met. Three minor style suggestions inline.
| - Command output (for CLI/API tests) | ||
| - Log excerpts (for backend behavior) | ||
|
|
||
| > **EVIDENCE GATE:** Every acceptance criterion in your PR comment MUST include at least one concrete artifact: screenshot, command output, log excerpt, or — for non-behavioral criteria verifiable only by reading code — "Verified by code inspection: [file:line]". A test result without evidence is not a valid test. One artifact may cover multiple related criteria if explicitly noted. Local file paths are NOT evidence — the reviewer cannot access your filesystem. |
There was a problem hiding this comment.
💡 Suggestion (P3)
The plan specified a separate "What does NOT count as evidence" section with bullet points and an example path (/tmp/screenshot.png). The inline approach is more concise, but the expanded format with concrete anti-examples is more scannable for testers who need to quickly self-check. Not blocking — the acceptance criteria are met either way.
| HANDOFF | ||
| ``` | ||
|
|
||
| You **MUST** attempt the handoff write before signaling completion. The `|| true` ensures CLI failures don't block you, but skipping this step entirely is not acceptable. If the write fails, note it in your PR comment. Note: handoff plumbing between phases may not be fully operational yet (#124) — the attempt is what matters, establishing the habit so that when the plumbing is fixed, data flows automatically. |
There was a problem hiding this comment.
💡 Suggestion (P3)
Style inconsistency: Section 5 uses a labeled blockquote callout (> **EVIDENCE GATE:**) but this section uses plain paragraph with inline bold (You **MUST** attempt). The plan specified **MANDATORY ATTEMPT:** as a labeled prefix to match the EVIDENCE GATE pattern. Using a consistent callout style (e.g., > **MANDATORY ATTEMPT:** You MUST attempt...) would improve document scannability. Not blocking.
| @@ -257,6 +261,13 @@ linear_linear(action="comment", id=$LEGION_ISSUE_ID, body="## Behavioral Test Re | |||
| **CRITICAL: The labels are how the controller knows you finished.** If you skip this, | |||
| the issue silently stalls. This is the MOST IMPORTANT step. | |||
|
|
|||
There was a problem hiding this comment.
😹 Nitpick (P3)
Double blank line before the pre-signal checklist (lines 262-263). Single blank line is the standard markdown separator.
9762c46 to
ff37e85
Compare
Closes #126
Summary
Strengthens the test workflow in two areas:
Evidence Gate (Change 7) — Elevates evidence requirements from guidance to a hard gate in Section 5. Every acceptance criterion must now include a concrete artifact (screenshot, command output, log, or code-ref). Local file paths are explicitly prohibited. PR template placeholders updated from
[output/screenshot]to[REQUIRED: screenshot/output/log/code-ref]. New Common Mistakes entry added.Handoff Enforcement (Change 8) — Adds MUST-attempt language for handoff writes in Section 6.5, acknowledging the partially broken plumbing (Planner handoff data doesn't reach tester or reviewer directly #124). Pre-signal checklist added to Section 7 requiring PR results posted and handoff write attempted before adding labels.
Both
.opencode/and.claude/copies are identical (hard-linked files, same inode).