Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The grad norm is nan #4

Open
sister-tong opened this issue Apr 10, 2024 · 5 comments
Open

The grad norm is nan #4

sister-tong opened this issue Apr 10, 2024 · 5 comments

Comments

@sister-tong
Copy link

Hi author, I'm getting the following when training branchformer using summary_mixing

[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:12,899 (ctc:67) WARNING: 13/34 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:13,133 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:13,263 (ctc:67) WARNING: 7/32 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:13,477 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:13,625 (ctc:67) WARNING: 21/45 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:13,858 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:14,022 (ctc:67) WARNING: 21/62 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:14,248 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:14,499 (ctc:67) WARNING: 37/105 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:14,735 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:14,875 (ctc:67) WARNING: 12/39 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:15,104 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:15,261 (ctc:67) WARNING: 23/56 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:15,479 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:15,623 (ctc:67) WARNING: 20/47 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:15,854 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:16,004 (ctc:67) WARNING: 15/53 samples got nan grad. These were ignored for CTC loss.
[autodl-container-4d6411b93c-8a044365] 2024-04-10 17:11:16,224 (build_trainer:660) WARNING: The grad norm is nan. Skipping updating the model.

Why is this

@TParcollet
Copy link
Contributor

Hello there, we would need much more information about what the model/trainer/data/task is to give you an answer. SummaryMixing does not, in itself, induce more instability during training than MHSA. With more information on the code, we could try to help.

@sister-tong
Copy link
Author

I tried to print the output of summary_mixing and the tensor shows that there is Nan, what is the reason for this

@TParcollet
Copy link
Contributor

Hi, we need much more information to help you here I am afraid. This could be due to many reasons that are all most likely not connected to SummaryMixing. Please describe your setup.

@sister-tong
Copy link
Author

Hi, when I print the encoder input when trying to use summing_mixing I find nan in it, but when I make RelPositionMultiHeadedAttention the input has no nan.
This is my configuration environment, the exact model configuration and the encoder structure is in the zip.

  linux:Ubuntu 20.04.4
  python=3.8.18
  torch=2.0.1
  funasr=0.8.2
  modelscope=1.9.3

code.zip

@TParcollet
Copy link
Contributor

Hello,

I've had a quick look at your code, but I am way too unfamiliar with this codebase to make any meaningful comment. My only comment would be that we never encountered any NaN issue with summarymixing so it might not be plugged-in properly (be careful with the masking for instance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants