Open
Description
Hello,
In the InternVL3 paper, it is mentioned that for the MMMU
and MMMU-Pro
benchmarks, results were reported based on the maximum accuracy achieved between direct-answer
and CoT
reasoning methods. I would like to ask whether the same evaluation strategy (i.e., reporting the better result between the two approaches) was also applied to the other benchmarks such as ChartQA and DocVQA.
Thank you!
你好,
在 InternVL3 的论文中提到,对于 MMMU
和 MMMU-Pro
基准测试,结果是基于直接回答(direct-answer
)和思维链(CoT
reasoning)两种方式中较高的准确率进行报告的。我想请问,像 ChartQA、DocVQA 等其他基准测试是否也采用了相同的评估策略,即在两种方法中选取更高的结果进行汇报?
谢谢!
Metadata
Metadata
Assignees
Labels
No labels