Performance on SQA3D

Thank you for your interesting work. I noticed that the performance on SQA3D is based on GPT4Scene's data, which leverages Mask3D priors to preprocess video frames and annotate object identifiers on the images. However, I believe it may not be reasonable to rely on such strong 3D priors in a unified video-based MLLM framework.
Have you evaluated the performance on SQA3D without using object identifiers?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance on SQA3D #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance on SQA3D #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions