Dear Developers,
Your work on SigLIP2 is truly impressive. As a beginner, I would like to ask: can the NaFlex version of SigLIP handle images larger than 512×512, for example 896×896 (assuming I have already set max_num_patches to 1024)? Will the performance degrade significantly in such cases? If my future task involves training an LLM+ViT model, would it be possible to mitigate this degradation during training? I am very much looking forward to your reply.
Best regards,