Poor pretraining results on 2b model

Hi Infinity authors, thanks for providing the code and the great work!

I am trying to reproduce your pretraining results. I am using 16 nodes or 128 GPUs with batch size 8 per gpu (so 1024 global batch size), training on only 256p square images, and a constant LR of 5e-5 after warmup, using your default 2b model and your publicly released 32 dim tokenizer. Other than that, I am using the same hyperparams that are in your train script and the defaults in the `Args` class.

My training loss and accuracy are poor after ~22k iterations

<img width="802" height="303" alt="Image" src="https://github.com/user-attachments/assets/4a9889b5-c910-4b9a-988c-4f0efa40d146" />

<img width="802" height="303" alt="Image" src="https://github.com/user-attachments/assets/84864a04-d964-48a7-8a3b-3ed056a60f57" />

And my generations are very mangled -- they look like the top left results from your Figure 11.

"A dog."

<img width="256" height="256" alt="Image" src="https://github.com/user-attachments/assets/19bfb7ec-3749-4893-9c41-30cbb5e70bc6" />

"A man."

<img width="256" height="256" alt="Image" src="https://github.com/user-attachments/assets/1d7ed5a5-1a5d-4570-8630-8f552f27f617" />

"A woman."

<img width="256" height="256" alt="Image" src="https://github.com/user-attachments/assets/a7e44396-957a-4239-acd4-97ae0137daf4" />

When I sanity checked by training with MNIST, I could get good results by the time bit accuracy reaches ~80%, but it seems my accuracy plateaus around 73% with real images. My questions are 1) Do my settings look reasonable, and 2) if my settings are reasonable, how long do I need to train to achieve visually good results?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Poor pretraining results on 2b model #119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Poor pretraining results on 2b model #119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions