Add config validation and style tweaks #37589

Kirire · 2025-04-17T16:26:20Z

Summary of changes:

Renamed input_states to hidden_states in the torch_forward method signature and body, to align with HuggingFace Transformers' naming conventions:
```
def torch_forward(self, hidden_states: torch.Tensor, ...)
```
This change improves consistency with the rest of the codebase and aligns better with the standard used in other HuggingFace models.
Removed redundant attention masking from the forward method:
```
if attention_mask is not None ...
    hidden_states = (hidden_states * attention_mask[:, :, None]).to(dtype)
```
This operation is already handled later in torch_forward, so it's no longer necessary here. Removing it avoids potential duplication and keeps the logic centralized.
Added a configuration validation check in Mamba2Config.__init__:
```
if hidden_size * expand != num_heads * head_dim:
    raise AttributeError(...)
```
This ensures that the configuration is internally consistent, and helps users catch misconfigurations early by raising a clear error during initialization.

Related to #37554

Improved Loss Handling During Gradient Accumulation

Modifies the loss computation by replacing the hardcoded use of CrossEntropyLoss:

loss_fct = CrossEntropyLoss()
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

with a more flexible call to a configurable loss_function:

loss = self.loss_function(logits=logits, labels=labels, vocab_size=self.config.vocab_size, **kwargs)

This change improves support for gradient accumulation scenarios and allows for easier customization of model-specific loss functions.

Related to #34191

github-actions · 2025-04-17T16:26:32Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Rocketknight1 · 2025-04-22T13:27:22Z

cc @vasqu @molbap from #37554

vasqu

Just a small nit on the type of error to raise. Could you also add a test to check if the error is properly (not) raised? Otherwise LGTM 🤗

Also sorry that I came so late to this. In the future, it would help us if you directly ping us in the PR so we don't need redirections / avoid missing your PR @Kirire ;)

src/transformers/models/mamba2/configuration_mamba2.py

src/transformers/models/mamba2/modeling_mamba2.py

vasqu · 2025-04-22T14:26:48Z

The failing CI can be fixed with make style (after having the deps installed e.g. pip install -e ".[quality]").

…ansformers into fix/mamba-config-check

Kirire · 2025-04-23T10:14:19Z

Hi! I ran make quality, everything should be good now 👍

vasqu · 2025-04-23T10:24:49Z

Just a last thing: Could you add some test to make sure it works as intended? #33316 is a good reference for how this can be done.

vasqu · 2025-05-03T23:40:32Z

Gentle ping @Kirire

ArthurZucker

LGTM thanks 🤗 waiting for the test and we can merge!

vasqu · 2025-05-12T08:02:17Z

@Kirire are you still interested in finishing this PR?

Kirire · 2025-05-12T15:11:19Z

Hi! Sorry for the late reply — I recently became a dad, so things have been a bit busy on my end 👶🙂
But yes, I'm definitely still interested, and I’ll get back to it starting tomorrow!

vasqu · 2025-05-12T15:51:16Z

First of all, congrats to you and your family !!! 🥳 And no worries, take your time.

Kirire · 2025-05-13T15:46:13Z

Thanks a lot! 😊
I've now added the test for the configuration — let me know if that works for you or if you'd like any changes!

vasqu

Mostly nits / style and some copy/paste mistake

tests/models/mamba2/test_modeling_mamba2.py

src/transformers/models/mamba2/modeling_mamba2.py

tests/models/mamba2/test_modeling_mamba2.py

Kirire · 2025-05-14T09:58:57Z

oups my bad ^^'

…ansformers into fix/mamba-config-check

HuggingFaceDocBuilderDev · 2025-05-14T12:22:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu · 2025-05-14T12:23:11Z

@Kirire thanks for contributing! And enjoy the days with your family 😉

* Add config validation and style tweaks * Fix style issues * Fix style issues * style * Small fixes for copy/paste errors --------- Co-authored-by: Cyrile <[email protected]>

Add config validation and style tweaks

a199d4e

github-actions bot marked this pull request as draft April 17, 2025 16:26

Kirire marked this pull request as ready for review April 17, 2025 17:55

github-actions bot requested review from ArthurZucker and Rocketknight1 April 17, 2025 17:56

vasqu approved these changes Apr 22, 2025

View reviewed changes

src/transformers/models/mamba2/configuration_mamba2.py Outdated Show resolved Hide resolved

src/transformers/models/mamba2/modeling_mamba2.py Show resolved Hide resolved

Cyrile and others added 4 commits April 23, 2025 11:50

Fix style issues

fada993

Merge branch 'main' into fix/mamba-config-check

b4078a9

Fix style issues

3cb7dda

Merge branch 'fix/mamba-config-check' of https://github.com/Kirire/tr…

7855d10

…ansformers into fix/mamba-config-check

ArthurZucker approved these changes May 9, 2025

View reviewed changes

style

d3e5f8c

Merge branch 'main' into fix/mamba-config-check

5f3179b

vasqu reviewed May 14, 2025

View reviewed changes

Cyrile added 2 commits May 14, 2025 12:01

Small fixes for copy/paste errors

462f8cf

Merge branch 'fix/mamba-config-check' of https://github.com/Kirire/tr…

6e4c81d

…ansformers into fix/mamba-config-check

vasqu enabled auto-merge (squash) May 14, 2025 12:02

Merge branch 'main' into fix/mamba-config-check

e22075d

vasqu merged commit 935bbbc into huggingface:main May 14, 2025
14 checks passed

vasqu mentioned this pull request May 17, 2025

Possible reshape error in Mamba2Mixer causing inference issue #37554

Closed

4 tasks

Add config validation and style tweaks #37589

Add config validation and style tweaks #37589

Uh oh!

Conversation

Kirire commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 17, 2025

Uh oh!

Rocketknight1 commented Apr 22, 2025

Uh oh!

vasqu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vasqu commented Apr 22, 2025

Uh oh!

Kirire commented Apr 23, 2025

Uh oh!

vasqu commented Apr 23, 2025

Uh oh!

vasqu commented May 3, 2025

Uh oh!

ArthurZucker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu commented May 12, 2025

Uh oh!

Kirire commented May 12, 2025

Uh oh!

vasqu commented May 12, 2025

Uh oh!

Kirire commented May 13, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kirire commented May 14, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 14, 2025

Uh oh!

vasqu commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Kirire commented Apr 17, 2025 •

edited

Loading

vasqu left a comment •

edited

Loading

ArthurZucker left a comment •

edited

Loading