Open
Description
Lines 36 to 48 in 7601190
NaN occurs atx = self.value_embedding(x) + self.position_embedding(x)
. The value_embedding and position_embedding layers are likely not robust to input values with larger magnitudes?
Metadata
Metadata
Assignees
Labels
No labels