`torch_run.py` lacking autocast and scaling for Automatic Mixed Precision #45

bhavnicksm · 2024-05-09T21:54:28Z

Hey,

As mentioned in the title, there is the direct conversion of the model to BF16, without the use of torch.amp functions of autocast and scaling needed for AMP.

This means that the projected memory shown here is only the 2bytes for the model (BF16) but the results post-training would be bad as per various sources. Beyond that, we would need AMP for it to work properly, which means getting 6 bytes per parameter, which blows the 24GiB mentioned in the paper out of the water.

For LLaMa3 8B, you would need 8 * 10^9 * 6 bytes ~ 44GiB for just parameter loading in BF16 AMP.

Just wanted to point it out, and ask about why this is made this way. The paper also mentions a 58GiB minimum -- but I think you'd need much more than that.

If this is a deliberate decision, please point me to the studies that show that such training has been stabilized.

src: [ https://docs.fast.ai/callback.fp16.html ]

The text was updated successfully, but these errors were encountered:

kyleliang919 · 2024-05-09T22:24:41Z

Hi @bhavnicksm, my latest finding is that it might not be the problem of Galore... however adam8bit is itself unstable. Galore just makes it even more unstable and it manifests even earlier in pretraining... If you train with adam8bit (full rank) for long enough, it will collapse at some point. Overall my current feeling is despite of what was claimed, this method along with adam8bit is only stable for finetuning.

bhavnicksm mentioned this issue May 9, 2024

Galore unstable on Llama 7B beyond 20K steps #43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`torch_run.py` lacking autocast and scaling for Automatic Mixed Precision #45

`torch_run.py` lacking autocast and scaling for Automatic Mixed Precision #45

bhavnicksm commented May 9, 2024

kyleliang919 commented May 9, 2024 •

edited

Loading

torch_run.py lacking autocast and scaling for Automatic Mixed Precision #45

torch_run.py lacking autocast and scaling for Automatic Mixed Precision #45

Comments

bhavnicksm commented May 9, 2024

kyleliang919 commented May 9, 2024 • edited Loading

`torch_run.py` lacking autocast and scaling for Automatic Mixed Precision #45

`torch_run.py` lacking autocast and scaling for Automatic Mixed Precision #45

kyleliang919 commented May 9, 2024 •

edited

Loading