Skip to content

Evaluation method for ChartQA: CoT or direct-answer? #1083

Open
@Ryoo72

Description

@Ryoo72

Hello,

In the InternVL3 paper, it is mentioned that for the MMMU and MMMU-Pro benchmarks, results were reported based on the maximum accuracy achieved between direct-answer and CoT reasoning methods. I would like to ask whether the same evaluation strategy (i.e., reporting the better result between the two approaches) was also applied to the other benchmarks such as ChartQA and DocVQA.

Thank you!

你好,

在 InternVL3 的论文中提到,对于 MMMUMMMU-Pro 基准测试,结果是基于直接回答(direct-answer)和思维链(CoT reasoning)两种方式中较高的准确率进行报告的。我想请问,像 ChartQA、DocVQA 等其他基准测试是否也采用了相同的评估策略,即在两种方法中选取更高的结果进行汇报?

谢谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions