Fix MMLUPro Handler #109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

SumanthRH merged 1 commit into main from sumanthrh/fix-mmlu

Apr 18, 2025

Merged

Fix MMLUPro Handler #109

skythought/evals/tasks/mmlu/mmlu_handler.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -49,7 +49,7 @@ def load_and_filter_dataset( @@
             return dataset.iloc[start:end] if end > 0 else dataset.iloc[start:]
-    class MMLUProTaskHandler(MMLUTaskHandler):
+    class MMLUProTaskHandler(TaskHandler):
         def __init__(self, task_config: TaskConfig):
             super().__init__(task_config)
             self.choices = [
@@ Expand All / @@ -71,9 +71,27 @@ def __init__(self, task_config: TaskConfig): @@
                 "P",
             ]
-        def generate_prompt(self, prompt):
+        def generate_prompt(self, problem):
+            multiple_choice_string = self.get_multiple_choice_answers(problem)
+            prompt = problem["question"] + "\n" + multiple_choice_string
             return self.task_config.templating_parameters["template"].format(prompt=prompt)
+        def update_results(self, problem, response):
+            # Initialize the response structure
+            response_entry = {
+                "content": response,
+                "correctness": None,
+                "reason": None,
+            }
+            curr_res = self.check_correctness(problem, generation=response)
+            if curr_res:
+                response_entry["correctness"] = True
+                response_entry["reason"] = ""
+            else:
+                response_entry["correctness"] = False
+                response_entry["reason"] = "Solution is incorrect."
+            return response_entry
         def check_correctness(self, problem, generation):
             pred = mmlu_pro_extract_answer(generation)
             answer = self.choices[problem["answer_index"]]
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MMLUPro Handler #109

Uh oh!

Diff view

Diff view

There are no files selected for viewing