fix:gaia dataset file attach issue and evaluation script format support by kyh035 · Pull Request #25 · cmriat/l0

kyh035 · 2025-07-01T08:32:34Z

What did you do

fix gaia dataset file attach issue and fix metric computing script format issue

New test cases

None

Test results

Other comments

None

Copilot

Pull Request Overview

This PR addresses file attachment formatting for the GAIA dataset and enhances the evaluation scripts’ configuration and environment handling.

Added visual_qa_tool_factory to the tool maps.
Enabled .env loading, string-label wrapping, and increased concurrency in the LLM evaluation script.
Updated evaluation runner script for the GAIA dataset.
Fixed prompt formatting in GAIA batch builder.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

File	Description
src/l0/traj_sampler/nb_agent_sampler/tool_specs.py	Added `visual_qa_tool_factory` to `TOOL_FACTORY_MAP` and enabled in `TOOL_SPECS_MAP`
evaluation/nb_agent_eval/simpleqa_metrics.py	Loaded environment from `.env`, wrapped `labels` when a string, and bumped `--workers` default to 64
evaluation/nb_agent_eval/run_eval.sh	Switched datasets to GAIA and updated config path
evaluation/nb_agent_eval/eval_datasets/gaia.py	Applied `.format(file_path=…)` to the file-attach prompt

Comments suppressed due to low confidence (1)

evaluation/nb_agent_eval/simpleqa_metrics.py:257

os.getenv is used here but os is not imported. Add import os at the top to avoid a NameError.

        client = openai.OpenAI(base_url=os.getenv("OPENAI_API_BASE"), api_key=os.getenv("OPENAI_API_KEY"))

Copilot · 2025-07-01T08:35:08Z

 from typing import Any

-TOOL_FACTORY_MAP: dict[str, str] = {"qa": {"web_search_tool_factory", "jina_reader_tool_factory"}, "math": {}}
+TOOL_FACTORY_MAP: dict[str, str] = {"qa": {"web_search_tool_factory", "jina_reader_tool_factory", "visual_qa_tool_factory"}, "math": {}}


The type annotation dict[str, str] is incorrect since the values are sets of strings. Consider using dict[str, set[str]] or Mapping[str, Set[str]] for accuracy.

Copilot · 2025-07-01T08:35:09Z

-    --datasets bamboogle musique simpleqa hotpotqa \
-    --config_path /root/AgentRL/evaluation/nb_agent_eval/config/sampler_config_direct.yaml
+    --datasets gaia \
+    --config_path /root/l0/evaluation/nb_agent_eval/config/sampler_config_claude.yaml


[nitpick] Using hardcoded absolute paths reduces portability—consider switching to relative paths or environment variables for the project root.

Suggested change

--config_path /root/l0/evaluation/nb_agent_eval/config/sampler_config_claude.yaml

--config_path "${PROJECT_ROOT}/evaluation/nb_agent_eval/config/sampler_config_claude.yaml"

… hint issue

junjzhang

LGTM

fix:gaia dataset file attach issue and evaluation script format support

dc74877

junjzhang requested review from Copilot and junjzhang July 1, 2025 08:32

Copilot AI reviewed Jul 1, 2025

View reviewed changes

chore: remove run_eval.sh and modify readme correspondingly. fix type…

0e2d3f5

… hint issue

junjzhang approved these changes Jul 1, 2025

View reviewed changes

junjzhang merged commit 2454c54 into cmriat:main Jul 1, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix:gaia dataset file attach issue and evaluation script format support#25

fix:gaia dataset file attach issue and evaluation script format support#25
junjzhang merged 2 commits into
cmriat:mainfrom
kyh035:fix_eval_format_support

kyh035 commented Jul 1, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 1, 2025

Uh oh!

Uh oh!

Copilot AI Jul 1, 2025

Uh oh!

junjzhang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	--config_path /root/l0/evaluation/nb_agent_eval/config/sampler_config_claude.yaml
	--config_path "${PROJECT_ROOT}/evaluation/nb_agent_eval/config/sampler_config_claude.yaml"

Conversation

kyh035 commented Jul 1, 2025

What did you do

New test cases

Test results

Other comments

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

junjzhang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants