Skip to content

run_eval.py: claude -p never triggers skills/commands (0% trigger rate across all queries) #556

@dthau120391

Description

@dthau120391

Bug Description

run_eval.py creates a command file in .claude/commands/ with the skill description in YAML frontmatter, then runs claude -p with test queries to check if Claude invokes the Skill or Read tool for that command. In practice, no query ever triggers the skill — all should-trigger queries get 0/3 trigger rate, including an explicit "run /lucas:memory-layer please".

Reproduction

Tested with 8 different skills (144 total queries across all eval sets). Every should-trigger query scored 0/3 triggers. Every should-not-trigger query also scored 0/3 (correctly, but vacuously).

Diagnostic test with maximally obvious trigger:

[{"query": "I need to use the lucas memory layer skill right now, run /lucas:memory-layer please", "should_trigger": true}]

Result: 0/3 triggers.

Environment

  • Claude Code v2.1.71
  • macOS Darwin 25.3.0
  • Model: claude-opus-4-6
  • Ran from external terminal (no active Claude Code session)

Analysis

The issue appears to be that claude -p does not auto-trigger commands placed in .claude/commands/ the same way an interactive Claude Code session triggers skills from ~/.claude/skills/. Possible causes:

  1. Commands vs Skills: Commands in .claude/commands/ may not appear in Claude's available_skills list the same way skills do, or may only be invocable via explicit /command-name syntax
  2. claude -p mode: The -p (pipe/prompt) mode may not load commands/skills at all, or may not present them to the model
  3. Simple query threshold: Claude may not consult skills for queries it can handle directly — but even explicit "run this skill" failed, ruling this out as the sole cause

Expected Behavior

At minimum, an explicit request like "run /skill-name please" should trigger the command (rate > 0/3). Description-matched queries should trigger at a reasonable rate.

Suggested Fix

  • Verify that claude -p actually loads and presents .claude/commands/ files to the model
  • Consider placing test skills in ~/.claude/skills/ instead of .claude/commands/ if skills and commands are triggered differently
  • Add a --debug flag to run_eval.py that dumps what Claude actually sees (system prompt, available skills list) to verify the command file is being presented

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions