-
Notifications
You must be signed in to change notification settings - Fork 29.3k
Unbreak optimum-executorch #38646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Unbreak optimum-executorch #38646
Conversation
cc @kimishpatel to unblock the work in optimum-et |
for onnx job, rebase on main will work |
This might be better to wait @Cyrilvallez 's back. Some explanation would be nice for them. |
@ydshieh The blamed PR messed up the recipe being used to export the model. For example, models with static cache will be exported using the recipe for hybrid cache due to the changes. This PR is making the minimal changes to just reverted the code that locates the recipe based on cache type explicitly. Can we prioritize to get this PR reviewed? We will need this fix to unblock some work in the downstream in optimum-executorch. |
Hey @guangy10! Sorry for the delay, I was on vacations! With a quick glance, checking |
Hey @Cyrilvallez, adding transformers/src/transformers/integrations/executorch.py Lines 59 to 65 in aa798b7
I think it's because you added the layer_types for qwen3 here: transformers/src/transformers/models/qwen3/configuration_qwen3.py Lines 210 to 216 in aa798b7
So in the downstream when I call export to executorch in optimum-executorch, you can see it's going off. Here is the call stack:
Pretty much all models that uses static cache will fail during export due to this issue. |
8f11b60
to
7046b2a
Compare
@Cyrilvallez I updated the PR with enhanced tests. That is, without reverting the changes in |
7046b2a
to
aefca28
Compare
Failure in |
Humm, but models with |
Is Qwen3 hybrid? Some model could work with both hybrid and static, I think the check or the added layer_types will force it always go export with hybrid cache.
Which check are you suggesting to remove? Maybe it's more clear if you can comment in the code inline? |
Well I'm simply talking about the check in your stacktrace here https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py#L403-L407 And yes, Qwen3 is hybrid in general, though it does not always have sliding layers (in which case Hybrid and Static caches are equivalent) |
aefca28
to
c5d9f34
Compare
@Cyrilvallez Looks like there are additional work needed in order to treat HybridCache and StaticCache (hybrid w/o sliding window) in a unified way. If I just removed the mentioned checking as you suggested, the export test still fails due to missing sliding window config.
I guess my main motivation for this PR is to restore the behavior to unbreak the downstream work in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, there is a check in the Cache directly as well
1501526
to
c894b9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, the change works for me! However, I'm just concerned about having several different APIs (the function and the class) to do seamingly the same thing. IMO we should choose either and standardize, especially since a lot is redundant - could be a future PR though WDYT?
c894b9e
to
76fd034
Compare
Fixed linter. @Cyrilvallez let me know if it's good to go. |
76fd034
to
bb2cff8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, last small detail, we don't need to add the view op! Let's remove it then I'll merge 🤗 Sorry for being annoying on this one 😬
Updated the PR. Should be good to go. |
What does this PR do?
Revert minimal changes made from #37866 that breaks export to ExecuTorch in huggingface/optimum-executorch when developing from latest
transformers
trunkTODO: Will update with tests shortly
Before submitting
Pull Request section?
to it if that's the case. I surfaced the issue in Slack
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker @Cyrilvallez @ydshieh