-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Description
Hi OpenGVLab Team,
Thank you for releasing InternVideo2.5.
I am trying to reproduce the official MVBench score of 75.7 reported for OpenGVLab/InternVideo2_5_Chat_8B.
I am using the HuggingFace reference implementation as a starting point:
https://huggingface.co/OpenGVLab/InternVideo2_5_Chat_8B#%F0%9F%9A%80-how-to-use-the-model
To ensure my evaluation matches yours, I would appreciate clarification on the exact protocol, including:
- the number of frames and the frame-sampling strategy,
- preprocessing details (input resolution, dynamic tiling, thumbnail usage),
- the prompt template used for multiple-choice questions,
- decoding settings and the rule for extracting the predicted option.
Most importantly, could you share the official evaluation script or configuration used to obtain the reported 75.7 score?
If available, reference evaluation settings for other datasets (LongVideoBench, VideoMME, EgoSchema, Perception Test, LVBench) would also be very helpful.
Thank you in advance!
~Sergey
sotayang
Metadata
Metadata
Assignees
Labels
No labels