-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Add config validation and style tweaks #37589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small nit on the type of error to raise. Could you also add a test to check if the error is properly (not) raised? Otherwise LGTM 🤗
Also sorry that I came so late to this. In the future, it would help us if you directly ping us in the PR so we don't need redirections / avoid missing your PR @Kirire ;)
The failing CI can be fixed with |
Hi! I ran make quality, everything should be good now 👍 |
Just a last thing: Could you add some test to make sure it works as intended? #33316 is a good reference for how this can be done. |
Gentle ping @Kirire |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks 🤗 waiting for the test and we can merge!
@Kirire are you still interested in finishing this PR? |
Hi! Sorry for the late reply — I recently became a dad, so things have been a bit busy on my end 👶🙂 |
First of all, congrats to you and your family !!! 🥳 And no worries, take your time. |
Thanks a lot! 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits / style and some copy/paste mistake
oups my bad ^^' |
…ansformers into fix/mamba-config-check
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@Kirire thanks for contributing! And enjoy the days with your family 😉 |
* Add config validation and style tweaks * Fix style issues * Fix style issues * style * Small fixes for copy/paste errors --------- Co-authored-by: Cyrile <[email protected]>
Summary of changes:
Renamed
input_states
tohidden_states
in thetorch_forward
method signature and body, to align with HuggingFace Transformers' naming conventions:This change improves consistency with the rest of the codebase and aligns better with the standard used in other HuggingFace models.
Removed redundant attention masking from the
forward
method:This operation is already handled later in
torch_forward
, so it's no longer necessary here. Removing it avoids potential duplication and keeps the logic centralized.Added a configuration validation check in
Mamba2Config.__init__
:This ensures that the configuration is internally consistent, and helps users catch misconfigurations early by raising a clear error during initialization.
Related to #37554
Modifies the loss computation by replacing the hardcoded use of
CrossEntropyLoss
:with a more flexible call to a configurable
loss_function
:This change improves support for gradient accumulation scenarios and allows for easier customization of model-specific loss functions.
Related to #34191