Skip to content

feat: eval type distinction — KB quality vs retrieval quality #30

Description

@rajnavakoti

Description

The eval pipeline currently runs one flow (retrieval + synthesis) and scores everything as CLEAN/INCOMPLETE/MISSING. But there are two distinct quality dimensions: does the KB contain the knowledge (KB quality), and can the retrieval pipeline find and synthesize it (retrieval quality). Separating these would let users diagnose whether gaps are curation problems or retrieval problems.

Acceptance Criteria

  • Eval results include an eval_type field: kb-quality | retrieval-quality
  • KB quality evals test whether entities exist and cover the topic (could use direct entity search, not full RAG)
  • Retrieval quality evals test the full pipeline (retrieval + synthesis + citation)
  • Viewer Evals page shows eval type as a filter/tab
  • A question can fail KB quality (knowledge doesn't exist) vs retrieval quality (knowledge exists but wasn't retrieved)

Out of Scope

  • Custom eval types beyond these two
  • A/B testing different retrieval configurations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions