On the effectiveness of the NaFlex variant for high-resolution images

Dear Developers,

Your work on SigLIP2 is truly impressive. As a beginner, I would like to ask: can the NaFlex version of SigLIP handle images larger than 512×512, for example 896×896 (assuming I have already set `max_num_patches` to 1024)? Will the performance degrade significantly in such cases? If my future task involves training an LLM+ViT model, would it be possible to mitigate this degradation during training? I am very much looking forward to your reply.

Best regards,




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On the effectiveness of the NaFlex variant for high-resolution images #181

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

On the effectiveness of the NaFlex variant for high-resolution images #181

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions