Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled 'num_items_in_batch' in Mistral model #34575

Open
2 of 4 tasks
gheinrich opened this issue Nov 2, 2024 · 0 comments · May be fixed by #34576
Open
2 of 4 tasks

Unhandled 'num_items_in_batch' in Mistral model #34575

gheinrich opened this issue Nov 2, 2024 · 0 comments · May be fixed by #34576
Labels

Comments

@gheinrich
Copy link

System Info

  • Transformer version: 4.46.0
  • Model: nvidia/Mistral-NeMo-Minitron-8B-Base

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

When calling the forward method on the NeMo Mistral model, the following exception occurs:

[rank2]:   File "/lustre/fsw/portfolios/llmservice/users/gheinrich/anaconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank2]:     result = forward_call(*args, **kwargs)
[rank2]: TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'

Expected behavior

The forward() method should use num_items_in_batch for the loss calculation.

@gheinrich gheinrich added the bug label Nov 2, 2024
gheinrich added a commit to gheinrich/transformers that referenced this issue Nov 2, 2024
This PR enables handling loss keyword arguments in the Mistral
forward() method. Specifically, if `num_items_in_batch` is passed,
the value is used to properly normalize the loss value.

This relates to the Gradient Accumulation fix (huggingface#34191)

Fixes huggingface#34575
@gheinrich gheinrich linked a pull request Nov 2, 2024 that will close this issue
gheinrich added a commit to gheinrich/transformers that referenced this issue Nov 2, 2024
This PR enables handling loss keyword arguments in the Mistral
forward() method. Specifically, if `num_items_in_batch` is passed,
the value is used to properly normalize the loss value.

This relates to the Gradient Accumulation fix (huggingface#34191)

Fixes huggingface#34575
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant