-
Notifications
You must be signed in to change notification settings - Fork 9.5k
Description
Bug Description
run_eval.py creates a command file in .claude/commands/ with the skill description in YAML frontmatter, then runs claude -p with test queries to check if Claude invokes the Skill or Read tool for that command. In practice, no query ever triggers the skill — all should-trigger queries get 0/3 trigger rate, including an explicit "run /lucas:memory-layer please".
Reproduction
Tested with 8 different skills (144 total queries across all eval sets). Every should-trigger query scored 0/3 triggers. Every should-not-trigger query also scored 0/3 (correctly, but vacuously).
Diagnostic test with maximally obvious trigger:
[{"query": "I need to use the lucas memory layer skill right now, run /lucas:memory-layer please", "should_trigger": true}]Result: 0/3 triggers.
Environment
- Claude Code v2.1.71
- macOS Darwin 25.3.0
- Model: claude-opus-4-6
- Ran from external terminal (no active Claude Code session)
Analysis
The issue appears to be that claude -p does not auto-trigger commands placed in .claude/commands/ the same way an interactive Claude Code session triggers skills from ~/.claude/skills/. Possible causes:
- Commands vs Skills: Commands in
.claude/commands/may not appear in Claude'savailable_skillslist the same way skills do, or may only be invocable via explicit/command-namesyntax claude -pmode: The-p(pipe/prompt) mode may not load commands/skills at all, or may not present them to the model- Simple query threshold: Claude may not consult skills for queries it can handle directly — but even explicit "run this skill" failed, ruling this out as the sole cause
Expected Behavior
At minimum, an explicit request like "run /skill-name please" should trigger the command (rate > 0/3). Description-matched queries should trigger at a reasonable rate.
Suggested Fix
- Verify that
claude -pactually loads and presents.claude/commands/files to the model - Consider placing test skills in
~/.claude/skills/instead of.claude/commands/if skills and commands are triggered differently - Add a
--debugflag torun_eval.pythat dumps what Claude actually sees (system prompt, available skills list) to verify the command file is being presented