Unhandled 'num_items_in_batch' in Mistral model #34575

gheinrich · 2024-11-02T08:29:27Z

System Info

Transformer version: 4.46.0
Model: nvidia/Mistral-NeMo-Minitron-8B-Base

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

When calling the forward method on the NeMo Mistral model, the following exception occurs:

[rank2]:   File "/lustre/fsw/portfolios/llmservice/users/gheinrich/anaconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank2]:     result = forward_call(*args, **kwargs)
[rank2]: TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'

Expected behavior

The forward() method should use num_items_in_batch for the loss calculation.

The text was updated successfully, but these errors were encountered:

This PR enables handling loss keyword arguments in the Mistral forward() method. Specifically, if `num_items_in_batch` is passed, the value is used to properly normalize the loss value. This relates to the Gradient Accumulation fix (huggingface#34191) Fixes huggingface#34575

gheinrich added the bug label Nov 2, 2024

gheinrich linked a pull request Nov 2, 2024 that will close this issue

Handle num_items_in_batch in Mistral's forward #34576

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unhandled 'num_items_in_batch' in Mistral model #34575

Unhandled 'num_items_in_batch' in Mistral model #34575

gheinrich commented Nov 2, 2024

Unhandled 'num_items_in_batch' in Mistral model #34575

Unhandled 'num_items_in_batch' in Mistral model #34575

Comments

gheinrich commented Nov 2, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior