Unbreak optimum-executorch #38646

guangy10 · 2025-06-06T17:54:20Z

What does this PR do?

Revert minimal changes made from #37866 that breaks export to ExecuTorch in huggingface/optimum-executorch when developing from latest transformers trunk

TODO: Will update with tests shortly

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. I surfaced the issue in Slack
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker @Cyrilvallez @ydshieh

guangy10 · 2025-06-06T17:55:13Z

cc @kimishpatel to unblock the work in optimum-et

ydshieh · 2025-06-06T18:14:37Z

for onnx job, rebase on main will work

ydshieh · 2025-06-06T18:16:03Z

This might be better to wait @Cyrilvallez 's back.

Some explanation would be nice for them.

guangy10 · 2025-06-10T01:24:28Z

This might be better to wait @Cyrilvallez 's back.
Some explanation would be nice for them.

@ydshieh The blamed PR messed up the recipe being used to export the model. For example, models with static cache will be exported using the recipe for hybrid cache due to the changes. This PR is making the minimal changes to just reverted the code that locates the recipe based on cache type explicitly. Can we prioritize to get this PR reviewed? We will need this fix to unblock some work in the downstream in optimum-executorch.

Cyrilvallez · 2025-06-10T09:11:49Z

Hey @guangy10! Sorry for the delay, I was on vacations! With a quick glance, checking layer_types attribute should be correct no? Which model does not export with the correct cache?

guangy10 · 2025-06-10T18:06:32Z

Hey @Cyrilvallez, adding layer_types in some models making it impossible to go to else branch in the following block

transformers/src/transformers/integrations/executorch.py

Lines 59 to 65 in aa798b7

    
           if not hasattr(model.config, "layer_types"): 
        
               # If `layer_types` is not specified explicitly in the config, there is only 1 type of layers, so 
        
               # export will use `StaticCache` by default. 
        
               logging.info("Using `StaticCache` for export as `layer_types` is not specified in the config.") 
        
               self.model = TorchExportableModuleWithStaticCache(model) 
        
           else: 
        
               self.model = TorchExportableModuleWithHybridCache(model, max_batch_size, max_cache_len)

I think it's because you added the layer_types for qwen3 here:

transformers/src/transformers/models/qwen3/configuration_qwen3.py

Lines 210 to 216 in aa798b7

    
           if self.layer_types is None: 
        
               self.layer_types = [ 
        
                   "sliding_attention" 
        
                   if self.sliding_window is not None and i >= self.max_window_layers 
        
                   else "full_attention" 
        
                   for i in range(self.num_hidden_layers) 
        
               ]

So in the downstream when I call export to executorch in optimum-executorch, you can see it's going off. Here is the call stack:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/huggingface/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/anaconda3/envs/huggingface/lib/python3.11/site-packages/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File "/opt/anaconda3/envs/huggingface/lib/python3.11/site-packages/optimum/commands/export/executorch.py", line 104, in run
    main_export(
  File "/opt/anaconda3/envs/huggingface/lib/python3.11/site-packages/optimum/exporters/executorch/__main__.py", line 138, in main_export
    return export_to_executorch(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/huggingface/lib/python3.11/site-packages/optimum/exporters/executorch/convert.py", line 77, in export_to_executorch
    executorch_progs = recipe_func(model, **kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/huggingface/lib/python3.11/site-packages/optimum/exporters/executorch/recipes/xnnpack.py", line 98, in export_to_executorch_with_xnnpack
    exported_progs = model.export()
                     ^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/huggingface/lib/python3.11/site-packages/optimum/exporters/executorch/integrations.py", line 58, in export
    exportable_module = TorchExportableModuleForDecoderOnlyLM(self.model, max_batch_size, max_cache_len)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/guangyang/transformers/src/transformers/integrations/executorch.py", line 65, in __init__
    self.model = TorchExportableModuleWithHybridCache(model, max_batch_size, max_cache_len)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/guangyang/transformers/src/transformers/integrations/executorch.py", line 407, in __init__
    raise AssertionError("Model must use 'hybrid' cache implementation")
AssertionError: Model must use 'hybrid' cache implementation
FAILED

Pretty much all models that uses static cache will fail during export due to this issue.

guangy10 · 2025-06-10T18:40:23Z

@Cyrilvallez I updated the PR with enhanced tests. That is, without reverting the changes in executorch.py, running test_export for these models will fail in the CI. If CI are green, can we get this fix merged to unblock the downstream optimum-executorch work that @kimishpatel and I are working on?

guangy10 · 2025-06-10T19:05:00Z

Failure in test_onnx is irrelevant to this PR

Cyrilvallez · 2025-06-10T20:25:37Z

Humm, but models with layer_types should always be Hybrid, and not the others. What you are experiencing is the fact that we removed the default cache_implementation="hybrid" in the config (to default back to DynamicCache), not the fact that we export with the wrong cache.
So we should just remove this check IMO

guangy10 · 2025-06-10T21:07:26Z

Humm, but models with layer_types should always be Hybrid, and not the others.

Is Qwen3 hybrid? Some model could work with both hybrid and static, I think the check or the added layer_types will force it always go export with hybrid cache.

What you are experiencing is the fact that we removed the default cache_implementation="hybrid" in the config (to default back to DynamicCache), not the fact that we export with the wrong cache. So we should just remove this check IMO

Which check are you suggesting to remove? Maybe it's more clear if you can comment in the code inline?

Cyrilvallez · 2025-06-10T22:22:59Z

Well I'm simply talking about the check in your stacktrace here https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py#L403-L407

And yes, Qwen3 is hybrid in general, though it does not always have sliding layers (in which case Hybrid and Static caches are equivalent)

guangy10 · 2025-06-10T23:12:51Z

Well I'm simply talking about the check in your stacktrace here https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py#L403-L407

And yes, Qwen3 is hybrid in general, though it does not always have sliding layers (in which case Hybrid and Static caches are equivalent)

@Cyrilvallez Looks like there are additional work needed in order to treat HybridCache and StaticCache (hybrid w/o sliding window) in a unified way. If I just removed the mentioned checking as you suggested, the export test still fails due to missing sliding window config.

        from transformers.integrations.executorch import TorchExportableModuleForDecoderOnlyLM

>       exportable_module = TorchExportableModuleForDecoderOnlyLM(model)

tests/models/qwen3/test_modeling_qwen3.py:288:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/transformers/integrations/executorch.py:65: in __init__
    self.model = TorchExportableModuleWithHybridCache(model, max_batch_size, max_cache_len)
src/transformers/integrations/executorch.py:404: in __init__
    self.cache = HybridCache(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <transformers.cache_utils.HybridCache object at 0x31ae42310>
config = Qwen3Config {
  "architectures": [
    "Qwen3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_types": [
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention"
  ],
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen3",
  "num_attention_heads": 16,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.53.0.dev0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

max_batch_size = 1, max_cache_len = 4096, device = device(type='cpu'), dtype = torch.bfloat16, layer_device_map = None

    def __init__(
        self,
        config: PretrainedConfig,
        max_batch_size: int,
        max_cache_len: Optional[int] = None,
        device: Union[torch.device, str, None] = None,
        dtype: torch.dtype = torch.float32,
        layer_device_map: Optional[Dict[int, Union[str, torch.device, int]]] = None,
    ) -> None:
        super().__init__()
        if not hasattr(config, "sliding_window") or config.sliding_window is None:
>           raise ValueError(
                "Setting `cache_implementation` to 'hybrid' requires the model config supporting "
                "sliding window attention, please check if there is a `sliding_window` field in the model "
                "config and it's not set to None."
            )
E           ValueError: Setting `cache_implementation` to 'hybrid' requires the model config supporting sliding window attention, please check if there is a `sliding_window` field in the model config and it's not set to None.

src/transformers/cache_utils.py:1610: ValueError
================================================================================================================ warnings summary ================================================================================================================
../../../opt/anaconda3/envs/huggingface/lib/python3.11/site-packages/_pytest/config/__init__.py:1441
  /opt/anaconda3/envs/huggingface/lib/python3.11/site-packages/_pytest/config/__init__.py:1441: PytestConfigWarning: Unknown config option: asyncio_default_fixture_loop_scope

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================================ short test summary info =============================================================================================================
FAILED tests/models/qwen3/test_modeling_qwen3.py::Qwen3IntegrationTest::test_export_static_cache - ValueError: Setting `cache_implementation` to 'hybrid' requires the model config supporting sliding window attention, please check if there is a `sliding_window` field in the model config and it's not set to None.

I guess my main motivation for this PR is to restore the behavior to unbreak the downstream work in optimum. We have two export recipes, one uses StaticCache and the other uses HybridCache. The checking upon layer_types in the export recipe here https://github.com/guangy10/transformers/blob/1094dd34f73dae1d9a91a6632635934516612490/src/transformers/integrations/executorch.py#L59
will be evaluated false always hence forcing Qwen3 (and similar models) to use Hybrid cache as "static cache" w/o sliding window, which is not working as shown above. If that's easy to fix, happy to get it corrected in this PR 😄

Cyrilvallez

Indeed, there is a check in the Cache directly as well

src/transformers/integrations/executorch.py

Cyrilvallez

Nice, the change works for me! However, I'm just concerned about having several different APIs (the function and the class) to do seamingly the same thing. IMO we should choose either and standardize, especially since a lot is redundant - could be a future PR though WDYT?

tests/models/gemma/test_modeling_gemma.py

guangy10 · 2025-06-11T18:23:28Z

Fixed linter. @Cyrilvallez let me know if it's good to go.

Cyrilvallez

Alright, last small detail, we don't need to add the view op! Let's remove it then I'll merge 🤗 Sorry for being annoying on this one 😬

src/transformers/integrations/executorch.py

guangy10 · 2025-06-12T17:22:45Z

Alright, last small detail, we don't need to add the view op! Let's remove it then I'll merge 🤗 Sorry for being annoying on this one 😬

Updated the PR. Should be good to go.

guangy10 mentioned this pull request Jun 10, 2025

🚨🚨[core] Completely rewrite the masking logic for all attentions #37866

Merged

guangy10 force-pushed the unbreak_optimum_et branch from 8f11b60 to 7046b2a Compare June 10, 2025 18:37

guangy10 force-pushed the unbreak_optimum_et branch from 7046b2a to aefca28 Compare June 10, 2025 18:53

Unbreak optimum-executorch

c5d9f34

guangy10 force-pushed the unbreak_optimum_et branch from aefca28 to c5d9f34 Compare June 10, 2025 22:39

Cyrilvallez reviewed Jun 11, 2025

View reviewed changes

src/transformers/integrations/executorch.py Outdated Show resolved Hide resolved

guangy10 force-pushed the unbreak_optimum_et branch from 1501526 to c894b9e Compare June 11, 2025 17:31

Cyrilvallez reviewed Jun 11, 2025

View reviewed changes

tests/models/gemma/test_modeling_gemma.py Show resolved Hide resolved

guangy10 force-pushed the unbreak_optimum_et branch from c894b9e to 76fd034 Compare June 11, 2025 18:23

guangy10 requested a review from Cyrilvallez June 11, 2025 18:24

use static cache if has layer_types but no sliding_window

bb2cff8

guangy10 force-pushed the unbreak_optimum_et branch from 76fd034 to bb2cff8 Compare June 12, 2025 00:32

Cyrilvallez reviewed Jun 12, 2025

View reviewed changes

src/transformers/integrations/executorch.py Outdated Show resolved Hide resolved

revert view on kv_arange

d2962e3

guangy10 requested a review from Cyrilvallez June 12, 2025 17:22

Unbreak optimum-executorch #38646

Are you sure you want to change the base?

Unbreak optimum-executorch #38646

Conversation

guangy10 commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

guangy10 commented Jun 6, 2025

Uh oh!

ydshieh commented Jun 6, 2025

Uh oh!

ydshieh commented Jun 6, 2025

Uh oh!

guangy10 commented Jun 10, 2025

Uh oh!

Cyrilvallez commented Jun 10, 2025

Uh oh!

guangy10 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guangy10 commented Jun 10, 2025

Uh oh!

guangy10 commented Jun 10, 2025

Uh oh!

Cyrilvallez commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guangy10 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guangy10 commented Jun 10, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

guangy10 commented Jun 11, 2025

Uh oh!

Cyrilvallez left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

guangy10 commented Jun 12, 2025

Uh oh!

Uh oh!

guangy10 commented Jun 6, 2025 •

edited

Loading

guangy10 commented Jun 10, 2025 •

edited

Loading

Cyrilvallez commented Jun 10, 2025 •

edited

Loading

guangy10 commented Jun 10, 2025 •

edited

Loading

Cyrilvallez commented Jun 10, 2025 •

edited

Loading

Cyrilvallez left a comment •

edited

Loading