-
Notifications
You must be signed in to change notification settings - Fork 75
Description
Hi Infinity authors, thanks for providing the code and the great work!
I am trying to reproduce your pretraining results. I am using 16 nodes or 128 GPUs with batch size 8 per gpu (so 1024 global batch size), training on only 256p square images, and a constant LR of 5e-5 after warmup, using your default 2b model and your publicly released 32 dim tokenizer. Other than that, I am using the same hyperparams that are in your train script and the defaults in the Args
class.
My training loss and accuracy are poor after ~22k iterations


And my generations are very mangled -- they look like the top left results from your Figure 11.
"A dog."

"A man."

"A woman."

When I sanity checked by training with MNIST, I could get good results by the time bit accuracy reaches ~80%, but it seems my accuracy plateaus around 73% with real images. My questions are 1) Do my settings look reasonable, and 2) if my settings are reasonable, how long do I need to train to achieve visually good results?