-
Notifications
You must be signed in to change notification settings - Fork 267
Closed
Description
@lucidrains
While training MuLaN on a dataset of around 5.2k samples, the loss goes to nan after some 15-16k steps.
My batch size is 4, and the text part of the data samples are tokenized using:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
text_in_numbers = tokenizer.encode(text)Does it has something to do with the zero division? or square-root of 0 in the loss function?
Metadata
Metadata
Assignees
Labels
No labels