Skip to content

Performance on SQA3D #3

@Germany321

Description

@Germany321

Thank you for your interesting work. I noticed that the performance on SQA3D is based on GPT4Scene's data, which leverages Mask3D priors to preprocess video frames and annotate object identifiers on the images. However, I believe it may not be reasonable to rely on such strong 3D priors in a unified video-based MLLM framework.
Have you evaluated the performance on SQA3D without using object identifiers?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions