-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The grad norm is nan #4
Comments
Hello there, we would need much more information about what the model/trainer/data/task is to give you an answer. SummaryMixing does not, in itself, induce more instability during training than MHSA. With more information on the code, we could try to help. |
I tried to print the output of summary_mixing and the tensor shows that there is Nan, what is the reason for this |
Hi, we need much more information to help you here I am afraid. This could be due to many reasons that are all most likely not connected to SummaryMixing. Please describe your setup. |
Hi, when I print the encoder input when trying to use summing_mixing I find nan in it, but when I make RelPositionMultiHeadedAttention the input has no nan.
|
Hello, I've had a quick look at your code, but I am way too unfamiliar with this codebase to make any meaningful comment. My only comment would be that we never encountered any NaN issue with summarymixing so it might not be plugged-in properly (be careful with the masking for instance). |
Hi author, I'm getting the following when training branchformer using summary_mixing
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:12,899 (ctc:67) WARNING: 13/34 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:13,133 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:13,263 (ctc:67) WARNING: 7/32 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:13,477 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:13,625 (ctc:67) WARNING: 21/45 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:13,858 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:14,022 (ctc:67) WARNING: 21/62 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:14,248 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:14,499 (ctc:67) WARNING: 37/105 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:14,735 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:14,875 (ctc:67) WARNING: 12/39 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:15,104 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:15,261 (ctc:67) WARNING: 23/56 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:15,479 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:15,623 (ctc:67) WARNING: 20/47 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:15,854 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:16,004 (ctc:67) WARNING: 15/53 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:16,224 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
Why is this
The text was updated successfully, but these errors were encountered: