-
Notifications
You must be signed in to change notification settings - Fork 665
Incorrect ground-truth answers in BLINK Relative_Reflectance TSV #1486
Description
Hi, thanks for maintaining this benchmark.
I found that the ground-truth answers for the BLINK Relative_Reflectance task in the VLM Eval Kit appear to be incorrect.
Problem
The task is distributed in TSV format in the eval kit, but the answer field for Relative_Reflectance seems to be wrong.
On the Open VLM Leaderboard, model performance on BLINK Relative_Reflectance is consistently very close to random guessing (~0.33).
Open VLM Leaderboard: BLINK Relative_Reflectance
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
In addition, I manually inspected multiple samples and found clear mismatches between the images and the provided ground-truth answers.
Verification
I checked the original dataset viewer here:
Original dataset:
https://huggingface.co/datasets/BLINK-Benchmark/BLINK/viewer/Relative_Reflectance
Using the correct data from there, I created a fixed TSV version:
Fixed TSV:
https://huggingface.co/buckets/Ryoo72/BLINK/resolve/BLINK.fixed.tsv?download=true
Request
Could you please verify the Relative_Reflectance annotations / answer column in the current TSV distributed with the VLM Eval Kit?
If helpful, I’d be happy to provide more details or help compare the current file against the corrected version.
Thanks!