Evaluation method for ChartQA: CoT or direct-answer?

Hello,

In the InternVL3 paper, it is mentioned that for the `MMMU` and `MMMU-Pro` benchmarks, results were reported based on the maximum accuracy achieved between `direct-answer` and `CoT` reasoning methods. I would like to ask whether the same evaluation strategy (i.e., reporting the better result between the two approaches) was also applied to the other benchmarks such as ChartQA and DocVQA.

Thank you!

你好，

在 InternVL3 的论文中提到，对于 `MMMU` 和 `MMMU-Pro` 基准测试，结果是基于直接回答（`direct-answer`）和思维链（`CoT` reasoning）两种方式中较高的准确率进行报告的。我想请问，像 ChartQA、DocVQA 等其他基准测试是否也采用了相同的评估策略，即在两种方法中选取更高的结果进行汇报？

谢谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluation method for ChartQA: CoT or direct-answer? #1083

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation method for ChartQA: CoT or direct-answer? #1083

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions