Description
The eval pipeline currently runs one flow (retrieval + synthesis) and scores everything as CLEAN/INCOMPLETE/MISSING. But there are two distinct quality dimensions: does the KB contain the knowledge (KB quality), and can the retrieval pipeline find and synthesize it (retrieval quality). Separating these would let users diagnose whether gaps are curation problems or retrieval problems.
Acceptance Criteria
Out of Scope
- Custom eval types beyond these two
- A/B testing different retrieval configurations
Description
The eval pipeline currently runs one flow (retrieval + synthesis) and scores everything as CLEAN/INCOMPLETE/MISSING. But there are two distinct quality dimensions: does the KB contain the knowledge (KB quality), and can the retrieval pipeline find and synthesize it (retrieval quality). Separating these would let users diagnose whether gaps are curation problems or retrieval problems.
Acceptance Criteria
eval_typefield:kb-quality|retrieval-qualityOut of Scope