-
Notifications
You must be signed in to change notification settings - Fork 29.7k
Open
Labels
Description
System Info
transformers
version: 4.54.0.dev0- Platform: macOS-15.5-arm64-arm-64bit
- Python version: 3.11.11
- Huggingface_hub version: 0.30.2
- Safetensors version: 0.5.3
- Accelerate version: 1.6.0
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.9.0.dev20250706 (NA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I tried out LiquidAI/LFM2-350M in optimum-executorch quickly (with latest transformers
installed from trunk), by running:
optimum-cli export executorch --model LiquidAI/LFM2-350M --task text-generation --recipe xnnpack --use_custom_sdpa --use_custom_kv_cache --qlinear --qembedding --output_dir lfm2
The model failed to export due to some data-dependent flow in the slow_forward:
E File "/Users/guangyang/transformers/src/transformers/models/lfm2/modeling_lfm2.py", line 485, in forward
E return self.slow_forward(hidden_states, past_key_value, cache_position, attention_mask)
E File "/Users/guangyang/transformers/src/transformers/models/lfm2/modeling_lfm2.py", line 453, in slow_forward
E if past_key_value is not None and cache_position[0] > 0:
which needs to be rewritten in order to make it exportable to ExecuTorch.
You can reproduce this export issue without opitmum-executorch by just hacking the test_static_cache_exportability
in tests/utils/test_cache_utils.py. Simply replace the model_id to "LiquidAI/LFM2-350M", then run the test:
RUN_SLOW=1 pytest tests/utils/test_cache_utils.py -vvv -s -k test_static_cache_exportability
Expected behavior
Expect to export the model by rewriting the data-dependent flow