Description
Once multi-domain and run comparison are in place, the Evals page should become a per-domain health dashboard that tracks coverage over time and recommends what to curate next. This is the north-star vision for the eval surface.
Acceptance Criteria
Out of Scope
- Automated curation (human reviews and approves all changes)
- Integration with external task management (Jira, Linear)
Description
Once multi-domain and run comparison are in place, the Evals page should become a per-domain health dashboard that tracks coverage over time and recommends what to curate next. This is the north-star vision for the eval surface.
Acceptance Criteria
Out of Scope