llamafactory中用lora微调qwen3-32b-awq模型时梯度为nan损失为0的问题及其解决方案 #9125
gysabc
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
数据:内部的数据
问题发生:训练日志中,某一条突然出现grad_norm为nan,之后的每条日志均是loss为0,grad_norm为nan
训练参数设置:
问题分析:
问题解决:
x=x.half()x = torch.clamp(x, min=torch.finfo(torch.float16).min, max=torch.finfo(torch.float16).max).half()Beta Was this translation helpful? Give feedback.
All reactions