Open Agentsia Labs benchmark harness for model runners, multi-turn evals, reproducible rubric scoring, proof bundles, and RunRecord output.
benchmark adtech reproducibility llm-evaluation eval-harness frontier-ai proof-bundles open-benchmark assay-adtech assay-harness runrecord multi-turn-evals iab-tech-lab
-
Updated
Jul 1, 2026 - TypeScript