Skip to content

Conversation

Kirire
Copy link
Contributor

@Kirire Kirire commented Apr 17, 2025

Summary of changes:

  1. Renamed input_states to hidden_states in the torch_forward method signature and body, to align with HuggingFace Transformers' naming conventions:

    def torch_forward(self, hidden_states: torch.Tensor, ...)

    This change improves consistency with the rest of the codebase and aligns better with the standard used in other HuggingFace models.

  2. Removed redundant attention masking from the forward method:

    if attention_mask is not None ...
        hidden_states = (hidden_states * attention_mask[:, :, None]).to(dtype)

    This operation is already handled later in torch_forward, so it's no longer necessary here. Removing it avoids potential duplication and keeps the logic centralized.

  3. Added a configuration validation check in Mamba2Config.__init__:

    if hidden_size * expand != num_heads * head_dim:
        raise AttributeError(...)

    This ensures that the configuration is internally consistent, and helps users catch misconfigurations early by raising a clear error during initialization.

Related to #37554

  1. Improved Loss Handling During Gradient Accumulation

Modifies the loss computation by replacing the hardcoded use of CrossEntropyLoss:

loss_fct = CrossEntropyLoss()
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

with a more flexible call to a configurable loss_function:

loss = self.loss_function(logits=logits, labels=labels, vocab_size=self.config.vocab_size, **kwargs)

This change improves support for gradient accumulation scenarios and allows for easier customization of model-specific loss functions.

Related to #34191

@github-actions github-actions bot marked this pull request as draft April 17, 2025 16:26
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@Kirire Kirire marked this pull request as ready for review April 17, 2025 17:55
@Rocketknight1
Copy link
Member

cc @vasqu @molbap from #37554

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small nit on the type of error to raise. Could you also add a test to check if the error is properly (not) raised? Otherwise LGTM 🤗

Also sorry that I came so late to this. In the future, it would help us if you directly ping us in the PR so we don't need redirections / avoid missing your PR @Kirire ;)

@vasqu
Copy link
Contributor

vasqu commented Apr 22, 2025

The failing CI can be fixed with make style (after having the deps installed e.g. pip install -e ".[quality]").

@Kirire
Copy link
Contributor Author

Kirire commented Apr 23, 2025

Hi! I ran make quality, everything should be good now 👍

@vasqu
Copy link
Contributor

vasqu commented Apr 23, 2025

Just a last thing: Could you add some test to make sure it works as intended? #33316 is a good reference for how this can be done.

@vasqu
Copy link
Contributor

vasqu commented May 3, 2025

Gentle ping @Kirire

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks 🤗 waiting for the test and we can merge!

@vasqu
Copy link
Contributor

vasqu commented May 12, 2025

@Kirire are you still interested in finishing this PR?

@Kirire
Copy link
Contributor Author

Kirire commented May 12, 2025

Hi! Sorry for the late reply — I recently became a dad, so things have been a bit busy on my end 👶🙂
But yes, I'm definitely still interested, and I’ll get back to it starting tomorrow!

@vasqu
Copy link
Contributor

vasqu commented May 12, 2025

First of all, congrats to you and your family !!! 🥳 And no worries, take your time.

@Kirire
Copy link
Contributor Author

Kirire commented May 13, 2025

Thanks a lot! 😊
I've now added the test for the configuration — let me know if that works for you or if you'd like any changes!

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits / style and some copy/paste mistake

@Kirire
Copy link
Contributor Author

Kirire commented May 14, 2025

oups my bad ^^'

@vasqu vasqu enabled auto-merge (squash) May 14, 2025 12:02
@vasqu vasqu merged commit 935bbbc into huggingface:main May 14, 2025
14 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vasqu
Copy link
Contributor

vasqu commented May 14, 2025

@Kirire thanks for contributing! And enjoy the days with your family 😉

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* Add config validation and style tweaks

* Fix style issues

* Fix style issues

* style

* Small fixes for copy/paste errors

---------

Co-authored-by: Cyrile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants