Skip to content

Poor pretraining results on 2b model #119

@daveboat

Description

@daveboat

Hi Infinity authors, thanks for providing the code and the great work!

I am trying to reproduce your pretraining results. I am using 16 nodes or 128 GPUs with batch size 8 per gpu (so 1024 global batch size), training on only 256p square images, and a constant LR of 5e-5 after warmup, using your default 2b model and your publicly released 32 dim tokenizer. Other than that, I am using the same hyperparams that are in your train script and the defaults in the Args class.

My training loss and accuracy are poor after ~22k iterations

Image Image

And my generations are very mangled -- they look like the top left results from your Figure 11.

"A dog."

Image

"A man."

Image

"A woman."

Image

When I sanity checked by training with MNIST, I could get good results by the time bit accuracy reaches ~80%, but it seems my accuracy plateaus around 73% with real images. My questions are 1) Do my settings look reasonable, and 2) if my settings are reasonable, how long do I need to train to achieve visually good results?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions