Skip to content

How can we get the better performance of SigLIP using full scratch training? #1125

@yutojubako

Description

@yutojubako

Hello,
When attempting to train SigLIP from scratch using this codebase, is it impossible to reproduce?
Even with an environment heavily utilizing H100 GPUs, we encounter issues such as:

  • Out of Memory errors preventing training; the same batch size works fine without SigLIP (global batch size = 64k).
  • Contrastive loss (without Sigmoid) provides more stable learning compared to Sigmoid Loss.
    • Using Sigmoid Loss increases likelihood of loss spikes.

Is there no way to train except by reducing the batch size? We are interested in training from scratch using a 5B-scale dataset, rather than fine-tuning.
Could you provide advice on training SigLIP from scratch?
Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions