Skip to content

[Benchmark] Add support for MMOral-OPG-Open benchmark#1484

Merged
mzr1996 merged 9 commits intoopen-compass:mainfrom
isjinghao:feature/mmoral-opg-open
Mar 27, 2026
Merged

[Benchmark] Add support for MMOral-OPG-Open benchmark#1484
mzr1996 merged 9 commits intoopen-compass:mainfrom
isjinghao:feature/mmoral-opg-open

Conversation

@isjinghao
Copy link
Copy Markdown
Contributor

Summary

This PR adds MMOral_OPG_OPEN, an open-ended VQA benchmark for panoramic radiograph analysis, to VLMEvalKit.
The benchmark is from the NeurIPS 2025 paper “Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis” (arXiv:2509.09254).

Dataset

  • Task: open-ended question answering on OPG images, requiring detailed clinical reasoning.
  • URL: https://huggingface.co/datasets/OralGPT/MMOral-OPG-Bench/resolve/main/MMOral-OPG-Bench-Open-Ended.tsv

Evaluation

  • Models generate free-form textual answers for each question.
  • A separate LLM judge scores the predictions using MMOral_opg_auxeval and MMOral_opg_acc from vlmeval/dataset/utils/mmoral_opg.py.
  • We report aggregated scores (overall and per category) in:
    • <model>_MMOral_OPG_OPEN_<judge>.xlsx (per-sample scores and logs)
    • <model>_MMOral_OPG_OPEN_<judge>_score.csv
    • <model>_MMOral_OPG_OPEN_<judge>_score_fine.csv.

Copilot AI review requested due to automatic review settings March 16, 2026 09:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the MMOral_OPG_OPEN open-ended VQA benchmark (panoramic dental radiographs) to VLMEvalKit, including LLM-judge scoring utilities and dataset registration so it can be built/evaluated like existing benchmarks.

Changes:

  • Introduces MMOral_OPG_OPEN dataset class with image dumping, prompting, and judge-based evaluation.
  • Adds MMOral-OPG judge prompt + aux evaluation (MMOral_opg_auxeval) and score aggregation (MMOral_opg_acc).
  • Registers the dataset in vlmeval.dataset so it’s discoverable via build_dataset / supported dataset lists.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
vlmeval/dataset/utils/mmoral_opg.py Adds LLM-judge prompt construction + aux-eval and score aggregation for MMOral-OPG.
vlmeval/dataset/mmoral_opg_open.py Implements the MMOral_OPG_OPEN dataset class, prompt building, and judge-based evaluation pipeline.
vlmeval/dataset/__init__.py Exposes and registers MMOral_OPG_OPEN in the dataset registry/lists.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

isjinghao and others added 3 commits March 16, 2026 17:18
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@mzr1996 mzr1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix lint.

@isjinghao isjinghao requested a review from mzr1996 March 26, 2026 01:58
@mzr1996 mzr1996 merged commit 589fe36 into open-compass:main Mar 27, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants