Enhance EVAL Mode with Anti-Sycophancy and Objective Metrics#76
Merged
Conversation
- Add anti-sycophancy rules prohibiting praise phrases - Add objective metrics framework for measurable evaluations - Restructure output to critique-first format - Add Devil's Advocate and Impact Radius Analysis sections - Require minimum 3 improvements per evaluation - Update code-reviewer.json and core.md close #70
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enhance EVAL Mode with Anti-Sycophancy and Objective Metrics
Summary
This PR significantly enhances EVAL mode to ensure objective, evidence-based code reviews without praise or subjective assessments. The changes transform the code reviewer agent into a skeptical third-party auditor that evaluates code output only, not implementation intent.
Problem Statement
Current Issues
Before this enhancement, EVAL mode had several problems:
Solution
Transform EVAL mode to:
Features
1. Anti-Sycophancy Rules
Philosophy: Evaluate like a skeptical third-party auditor who has never seen this code before.
Key Rules:
Prohibited Phrases (English + Korean):
Required Language:
2. Objective Metrics Framework
All evaluations must be based on measurable, objective criteria:
Code Metrics:
anyusages targetChecklist Metrics:
Documentation Metrics (for non-code changes):
Output Requirement: Every finding MUST include:
3. Critique-First Output Format
New Structure Order:
4. Devil's Advocate Analysis
Systematic challenge from opposing viewpoints:
Mandatory Questions:
Subsections:
5. Impact Radius Analysis
Analyze side effects and ripple effects beyond modified files:
Direct Dependencies:
Contract Changes:
Side Effect Checklist:
Breaking Change Criteria:
Changes
Modified Files
.ai-rules/agents/code-reviewer.json(+393 lines, -41 lines)anti_sycophancysection with prohibited phrases and required languageobjective_metricssection with code, checklist, and documentation metricsimpact_radius_analysissection with dependency analysis frameworkevaluation_output_formatto critique-first orderpersonato "Skeptical third-party auditor"execution_orderwith new 17-step process.ai-rules/rules/core.md(+101 lines, -1 line)docs/tickets/eval-mode-neutrality-implementation.md(new file, 337 lines)Execution Flow
New 17-Step Process:
Benefits
Verification
Self-Check Requirements:
Special Cases
Documentation-only changes:
documentation_metricsinstead of code metricsNo changes to evaluate:
Test-only changes:
New file with no dependencies:
Related Issue
Closes #70
Files Changed
.ai-rules/agents/code-reviewer.json- Major update (+393/-41 lines).ai-rules/rules/core.md- EVAL mode section restructured (+101/-1 line)docs/tickets/eval-mode-neutrality-implementation.md- New implementation plan (337 lines)Statistics
Testing
Success Metrics
Test Scenarios
Migration Notes
Breaking Changes:
Backward Compatibility:
Rollback Plan
If issues arise:
code-reviewer.jsonto removeanti_sycophancyandobjective_metricssectionscore.mdEVAL mode section to original formatNext Steps
Notes
This is a fundamental shift in how EVAL mode operates. The changes ensure that code reviews are:
The implementation is rule-based (no infrastructure changes needed) - AI assistants follow the documented guidelines to implement the enhanced evaluation process.