Leaderboard #2

neginraoof · 2024-09-12T22:23:55Z

Evaluate your LLM as a generator for context-based question answering, and create a leaderboard ranking generator models.

generate_qac.py: Use sample questions and corresponding contexts to query your LLMs to generate answer. The answer should be supported by the context.
eval_qac_agreement.py: Evaluate generated answers on Agreement with GOLD Answer with LLM-as-a-judge. Create a leaderboard to rank generator models based on agreement.

madiator · 2024-09-24T06:59:44Z

Hi Negin, is this code up to date? Ready for review?

neginraoof · 2024-10-02T23:41:32Z

@madiator Yes, this is ready now. Thanks!

madiator

Thanks!

leaderboard for QAC agreement w/ gold answer metric

40f074b

neginraoof force-pushed the leaderboard branch 3 times, most recently from 214ceb5 to bda2d1c Compare September 18, 2024 18:27

readme description

e421223

neginraoof force-pushed the leaderboard branch from bda2d1c to e421223 Compare October 2, 2024 23:40

madiator approved these changes Oct 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaderboard #2

Leaderboard #2

Uh oh!

neginraoof commented Sep 12, 2024

Uh oh!

madiator commented Sep 24, 2024

Uh oh!

neginraoof commented Oct 2, 2024

Uh oh!

madiator left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Leaderboard #2

Are you sure you want to change the base?

Leaderboard #2

Uh oh!

Conversation

neginraoof commented Sep 12, 2024

Uh oh!

madiator commented Sep 24, 2024

Uh oh!

neginraoof commented Oct 2, 2024

Uh oh!

madiator left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants