You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR enables handling loss keyword arguments in the Mistral
forward() method. Specifically, if `num_items_in_batch` is passed,
the value is used to properly normalize the loss value.
This relates to the Gradient Accumulation fix (huggingface#34191)
Fixeshuggingface#34575
This PR enables handling loss keyword arguments in the Mistral
forward() method. Specifically, if `num_items_in_batch` is passed,
the value is used to properly normalize the loss value.
This relates to the Gradient Accumulation fix (huggingface#34191)
Fixeshuggingface#34575
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
When calling the forward method on the NeMo Mistral model, the following exception occurs:
Expected behavior
The forward() method should use
num_items_in_batch
for the loss calculation.The text was updated successfully, but these errors were encountered: