Skip to content

Enhance EVAL Mode with Anti-Sycophancy and Objective Metrics#76

Merged
JeremyDev87 merged 1 commit into
masterfrom
feat/70
Dec 22, 2025
Merged

Enhance EVAL Mode with Anti-Sycophancy and Objective Metrics#76
JeremyDev87 merged 1 commit into
masterfrom
feat/70

Conversation

@JeremyDev87

Copy link
Copy Markdown
Owner

Enhance EVAL Mode with Anti-Sycophancy and Objective Metrics

Summary

This PR significantly enhances EVAL mode to ensure objective, evidence-based code reviews without praise or subjective assessments. The changes transform the code reviewer agent into a skeptical third-party auditor that evaluates code output only, not implementation intent.

Problem Statement

Current Issues

Before this enhancement, EVAL mode had several problems:

  1. Subjective praise - Reviews often included phrases like "Great job", "Well done", "Excellent work" which don't add value
  2. Intent-based evaluation - Reviews defended implementation decisions from PLAN/ACT phases instead of evaluating output objectively
  3. Lack of measurable criteria - Findings were often subjective without concrete metrics
  4. Praise-first approach - Strengths were listed before problems, reducing focus on improvements
  5. Missing adversarial analysis - No systematic challenge of assumptions or edge cases
  6. Insufficient impact analysis - Changes weren't evaluated for side effects and dependencies

Solution

Transform EVAL mode to:

  • Prohibit praise phrases (English and Korean)
  • Require objective, measurable metrics for all findings
  • Restructure output to critique-first format
  • Add Devil's Advocate Analysis section
  • Add Impact Radius Analysis for dependency tracking
  • Require minimum 3 improvements per evaluation
  • Evaluate OUTPUT only, never implementation INTENT

Features

1. Anti-Sycophancy Rules

Philosophy: Evaluate like a skeptical third-party auditor who has never seen this code before.

Key Rules:

  • Evaluate OUTPUT only, never implementer's INTENT
  • Assume bugs exist until proven otherwise
  • Challenge every design decision
  • Start with problems, not praise
  • Identify at least 3 improvement areas OR all identified issues

Prohibited Phrases (English + Korean):

  • English: "Great job", "Well done", "Excellent work", "Good implementation", "Perfect", "Impressive", etc.
  • Korean: "잘했어", "훌륭해", "완벽해", "깔끔해", "좋아", "멋져", etc.

Required Language:

  • Findings: "Evidence shows...", "Metric indicates...", "Violation found at...", "Gap identified..."
  • Neutral observations: "The implementation uses...", "The code contains...", "Measurement shows..."

2. Objective Metrics Framework

All evaluations must be based on measurable, objective criteria:

Code Metrics:

  • Test coverage: >=90% target
  • Type safety: 0 any usages target
  • Cyclomatic complexity: <=10 per function
  • Function length: <=20 lines
  • Nesting depth: <=3 levels
  • Bundle size delta: <=20KB per feature

Checklist Metrics:

  • Security: OWASP Top 10 checklist
  • Accessibility: WCAG 2.1 AA criteria
  • Performance: Core Web Vitals targets

Documentation Metrics (for non-code changes):

  • Clarity: ambiguous terms count (target: 0)
  • Completeness: missing sections count (target: 0)
  • Consistency: inconsistency count (target: 0)
  • Actionability: vague instruction count (target: 0)

Output Requirement: Every finding MUST include:

  • Location (file:line or section)
  • Measured value
  • Target value
  • Gap/delta

3. Critique-First Output Format

New Structure Order:

  1. Mode indicator
  2. Agent name
  3. Context (Reference Only) - factual summary, no defense
  4. Critical Findings - ALL metric violations FIRST
  5. Devil's Advocate Analysis - challenge assumptions, edge cases, failure modes
  6. Impact Radius Analysis - dependencies, contract changes, side effects
  7. Objective Assessment - PASS/FAIL metrics table
  8. What Works - facts only, NO praise
  9. Improvement Plan - prioritized with evidence
  10. Anti-Sycophancy Verification - self-check

4. Devil's Advocate Analysis

Systematic challenge from opposing viewpoints:

Mandatory Questions:

  • What assumptions might be wrong?
  • What edge cases are unhandled?
  • How might this fail under load/scale?
  • What security vectors are exposed?
  • Where could this introduce regression?
  • What happens when dependencies change?

Subsections:

  • What could go wrong?
  • Assumptions that might be wrong
  • Unhandled edge cases

5. Impact Radius Analysis

Analyze side effects and ripple effects beyond modified files:

Direct Dependencies:

  • Files that directly import/reference changed files
  • Table format: Changed File | Imported By | Potential Impact

Contract Changes:

  • Function signature changes
  • Type definition changes
  • Export changes
  • Before/After comparison with breaking change assessment

Side Effect Checklist:

  • Type compatibility
  • Behavior compatibility
  • Test coverage
  • Error handling
  • State management
  • Async flow

Breaking Change Criteria:

  • Definitely breaking: Removed exports, changed required parameters, narrowed return types
  • Potentially breaking: Added required parameters with defaults, widened return types
  • Safe changes: Added new exports, added optional parameters, internal refactoring

Changes

Modified Files

  1. .ai-rules/agents/code-reviewer.json (+393 lines, -41 lines)

    • Added anti_sycophancy section with prohibited phrases and required language
    • Added objective_metrics section with code, checklist, and documentation metrics
    • Added impact_radius_analysis section with dependency analysis framework
    • Restructured evaluation_output_format to critique-first order
    • Updated persona to "Skeptical third-party auditor"
    • Updated execution_order with new 17-step process
    • Added 7 new mandatory checklist items
    • Updated verification guide with new checks
  2. .ai-rules/rules/core.md (+101 lines, -1 line)

    • Added Anti-Sycophancy Rules section to EVAL mode
    • Restructured output format to critique-first approach
    • Added Critical Findings table format
    • Added Devil's Advocate Analysis section
    • Added Impact Radius Analysis section
    • Added Objective Assessment table
    • Changed "Strengths" to "What Works (Evidence Required)"
    • Added Anti-Sycophancy Verification checklist
    • Added special cases handling (documentation-only, no changes)
  3. docs/tickets/eval-mode-neutrality-implementation.md (new file, 337 lines)

    • Complete implementation plan document
    • Phase-by-phase breakdown
    • Validation criteria
    • Test scenarios
    • Rollback plan

Execution Flow

New 17-Step Process:

  1. Write # Mode: EVAL
  2. Write ## Agent : Code Reviewer
  3. Write ## Context (Reference Only) - factual summary, no defense
  4. Gather objective metrics (run coverage, count any usages, measure complexity)
  5. Write ## Critical Findings table - ALL metric violations FIRST
  6. Write ## Devil's Advocate Analysis - challenge assumptions, edge cases, failure modes
  7. Write ## Impact Radius Analysis - analyze dependencies and side effects
    • 7a. Search for files importing changed files
    • 7b. List direct dependencies in table format
    • 7c. Identify contract changes (signatures, types, exports)
    • 7d. Complete Side Effect Checklist
  8. Write ## Objective Assessment table - PASS/FAIL for each metric
  9. Write ## What Works - facts only, NO praise or positive adjectives
  10. For each improvement: Call web_search → Write with evidence
  11. Create todo list using todo_write tool (prioritized, all pending)
  12. Write ## Improvement Plan
  13. Write ## Anti-Sycophancy Verification - self-check
  14. Verify: No prohibited phrases used
  15. Verify: Minimum 3 improvements identified
  16. Verify: All findings have location + metric + target
  17. Verify: Impact Radius Analysis completed

Benefits

  1. Objective Evaluations - All findings backed by measurable metrics
  2. No Praise Pollution - Reviews focus on actionable improvements
  3. Adversarial Thinking - Systematic challenge prevents overlooked issues
  4. Impact Awareness - Dependency analysis prevents regression bugs
  5. Consistent Quality - Minimum 3 improvements ensures thorough review
  6. Intent Separation - Evaluates code output, not implementation reasoning

Verification

Self-Check Requirements:

  • No prohibited phrases used (English + Korean)
  • At least 3 improvement areas OR all identified issues reported
  • All findings include objective evidence (location, metric, target)
  • Devil's Advocate Analysis completed
  • Impact Radius Analysis completed (dependencies, contract changes, side effects)
  • Critical Findings section appears before What Works
  • No defense of implementation decisions

Special Cases

Documentation-only changes:

  • Use documentation_metrics instead of code metrics
  • Evaluate: clarity, completeness, consistency, actionability
  • Critical Findings table references section names instead of file:line

No changes to evaluate:

  • State "No implementation to evaluate" in Context section
  • Skip Critical Findings and Objective Assessment tables
  • Focus Devil's Advocate on the request/plan itself

Test-only changes:

  • State "Test-only changes - no production impact"
  • Skip Direct Dependencies, focus on test coverage impact

New file with no dependencies:

  • State "New file - no existing dependencies"
  • Evaluate API design for future maintainability

Related Issue

Closes #70

Files Changed

  • .ai-rules/agents/code-reviewer.json - Major update (+393/-41 lines)
  • .ai-rules/rules/core.md - EVAL mode section restructured (+101/-1 line)
  • docs/tickets/eval-mode-neutrality-implementation.md - New implementation plan (337 lines)

Statistics

  • 3 files changed
  • 790 insertions, 41 deletions
  • 1 new file added (implementation plan)
  • 2 files significantly updated

Testing

Success Metrics

  1. Zero prohibited phrases in EVAL output
  2. 100% of findings include objective metrics
  3. Minimum 3 improvements identified per evaluation
  4. Devil's Advocate section present in every EVAL
  5. "What Works" contains only factual observations (no praise)
  6. Impact Radius Analysis completed for all code changes

Test Scenarios

  1. Simple implementation - Should still find 3+ improvements
  2. Good implementation - Should still challenge assumptions
  3. Poor implementation - Should not over-emphasize negatives (remain balanced)
  4. AI's own code - Should evaluate with same rigor (no self-defense)
  5. Documentation changes - Should use documentation metrics
  6. No changes - Should handle gracefully

Migration Notes

Breaking Changes:

  • EVAL output format has changed significantly
  • Old "Strengths" section replaced with "What Works"
  • New mandatory sections: Critical Findings, Devil's Advocate, Impact Radius Analysis

Backward Compatibility:

  • Existing EVAL requests will automatically use new format
  • No user action required
  • Old format is completely replaced

Rollback Plan

If issues arise:

  1. Revert code-reviewer.json to remove anti_sycophancy and objective_metrics sections
  2. Revert core.md EVAL mode section to original format
  3. Remove implementation plan document

Next Steps

  • Monitor EVAL outputs for compliance with new rules
  • Gather feedback on review quality improvements
  • Adjust prohibited phrases list if needed
  • Consider extending to other review contexts if successful

Notes

This is a fundamental shift in how EVAL mode operates. The changes ensure that code reviews are:

  • Objective - Based on measurable criteria, not opinions
  • Thorough - Minimum 3 improvements ensures comprehensive review
  • Adversarial - Challenges assumptions and finds edge cases
  • Impact-aware - Analyzes dependencies and side effects
  • Neutral - No praise, no criticism, only factual observations

The implementation is rule-based (no infrastructure changes needed) - AI assistants follow the documented guidelines to implement the enhanced evaluation process.

- Add anti-sycophancy rules prohibiting praise phrases
- Add objective metrics framework for measurable evaluations
- Restructure output to critique-first format
- Add Devil's Advocate and Impact Radius Analysis sections
- Require minimum 3 improvements per evaluation
- Update code-reviewer.json and core.md

close #70
@JeremyDev87 JeremyDev87 self-assigned this Dec 22, 2025
@JeremyDev87 JeremyDev87 marked this pull request as ready for review December 22, 2025 01:54
@JeremyDev87 JeremyDev87 merged commit ff19738 into master Dec 22, 2025
9 checks passed
@JeremyDev87 JeremyDev87 deleted the feat/70 branch December 22, 2025 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance EVAL Mode Neutrality and Objectivity

2 participants