[Evals] Fix MinervaMath dataset YAML for correct evaluation #117

QWangCV · 2025-06-24T09:55:31Z

In the MinervaMath dataset svc-huggingface/minerva-math, the solution field contains both the problem-solving process and the final answer. For example:

Start with:
[
s=\alpha f \text {, }
]
where $s$ is the diameter of the image, $f$ the focal length, and $\alpha$ the angular diameter of the planet. For the values given in the problem:
[
s=\frac{45}{3600} \frac{\pi}{180} 7200=\boxed{1.6} \mathrm{~cm}
]

This format leads to an evaluation accuracy of 0, as the answer is embedded within the full solution text. Since svc-huggingface/minerva-math does not provide a separate field for the final answer, I switched to using the math-ai/minervamath dataset, which includes a clearly separated answer field. I also updated the YAML configuration for the MinervaMath dataset accordingly.

After this modification, the accuracy of Qwen/Qwen2.5-Math-7B on MinervaMath improved from 0 to 0.2684, indicating that the bug has been resolved.

ko120 · 2025-07-12T02:49:44Z

I also encountered this problem, and I figured out that it was caused by using the “math” handler instead of the “minervamath” handler in minervamath.yaml (which handles answer extraction). Changing the dataset as you suggested also resolved it.

Fix MinervaMath dataset YAML for correct evaluation

d44a5f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Evals] Fix MinervaMath dataset YAML for correct evaluation #117

[Evals] Fix MinervaMath dataset YAML for correct evaluation #117

Uh oh!

QWangCV commented Jun 24, 2025

Uh oh!

ko120 commented Jul 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Evals] Fix MinervaMath dataset YAML for correct evaluation #117

Are you sure you want to change the base?

[Evals] Fix MinervaMath dataset YAML for correct evaluation #117

Uh oh!

Conversation

QWangCV commented Jun 24, 2025

Uh oh!

ko120 commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ko120 commented Jul 12, 2025 •

edited

Loading