feat: eval pipeline run metadata and comparison

## Description
Add run identity, timestamps, and comparison capability to the eval pipeline and viewer. Currently eval results are flat JSON arrays with no run metadata — users can't tell when an eval was run, what changed between runs, or track coverage improvement over time.

## Acceptance Criteria
- [ ] Eval pipeline writes run metadata (run_id, timestamp, domain, source_mix, entity_count_at_time) to eval output
- [ ] Viewer loads multiple eval result files and presents them as named runs
- [ ] Run comparison mode shows CLEAN/INCOMPLETE/MISSING deltas between two runs
- [ ] Coverage trend visualization (sparkline or small chart) when 3+ runs exist
- [ ] Run selector dropdown in Evals page header

## Out of Scope
- Multi-domain support (separate issue)
- Retrieval quality vs KB quality distinction (separate issue)
- Automated scheduled eval runs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: eval pipeline run metadata and comparison #29

Description

Acceptance Criteria

Out of Scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat: eval pipeline run metadata and comparison #29

Description

Description

Acceptance Criteria

Out of Scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions