System Info
8xH100
Who can help?
No response
Information
Tasks
Reproduction
After updating to the latest master branch of transformer, the training loss is mutiple times higher than before (5x-10x). I tried both SFT and DPO (paired with latest trl master), all having the same problems.
SFT after GA fix

SFT before GA fix

Expected behavior
training loss value should be aligned with old values, or should be expected lower.