CI fails: ImportError: FlashAttention2

CI fails for Slow tests: https://github.com/huggingface/trl/actions/runs/18106208226/job/51521315792
```python
ImportError: FlashAttention2: has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
```
```python
FAILED tests/slow/test_grpo_slow.py::GRPOTrainerSlowTester::test_vlm_training_0_HuggingFaceTB_SmolVLM_Instruct - ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
```

Traceback:
```python
tests/slow/test_grpo_slow.py:267: in test_vlm_training
    model = AutoModelForImageTextToText.from_pretrained(
.venv/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:604: in from_pretrained
    return model_class.from_pretrained(
.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:288: in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:5106: in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.11/site-packages/transformers/models/idefics3/modeling_idefics3.py:840: in __init__
    super().__init__(config)
.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:2197: in __init__
    self.config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:2807: in _check_and_adjust_attn_implementation
    applicable_attn_implementation = self.get_correct_attn_implementation(
.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:2835: in get_correct_attn_implementation
    self._flash_attn_2_can_dispatch(is_init_check)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = Idefics3ForConditionalGeneration(), is_init_check = True

    def _flash_attn_2_can_dispatch(self, is_init_check: bool = False) -> bool:
        """
        Check the availability of Flash Attention 2 for a given model.
    
        Args:
            is_init_check (`bool`, *optional*):
                Whether this check is performed early, i.e. at __init__ time, or later when the model and its weights are
                fully instantiated. This is needed as we also check the devices of the weights, and/or if the model uses
                BetterTransformer, which are only available later after __init__. This allows to raise proper exceptions early
                before instantiating the full models if we know that the model does not support the requested attention.
        """
        dtype = self.config.dtype
    
        # check `supports_flash_attn_2` for BC with custom code. TODO: remove after a few releases
        if not (self._supports_flash_attn or getattr(self, "_supports_flash_attn_2", False)):
            raise ValueError(
                f"{self.__class__.__name__} does not support Flash Attention 2.0 yet. Please request to add support where"
                f" the model is hosted, on its model hub page: [https://huggingface.co/{self.config._name_or_path}/discussions/new](https://huggingface.co/%7Bself.config._name_or_path%7D/discussions/new)"
                " or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new"
            )
    
        if not is_flash_attn_2_available():
            preface = "FlashAttention2 has been toggled on, but it cannot be used due to the following error:"
            install_message = "Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2."
    
            # package `flash-attn` can not be installed on Ascend NPU, following validation logics can be ignored.
            if is_torch_npu_available():
                logger.info("Detect using FlashAttention2 on Ascend NPU.")
                return True
    
            if importlib.util.find_spec("flash_attn") is None:
>               raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
E               ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:2547: ImportError
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI fails: ImportError: FlashAttention2 #4175

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CI fails: ImportError: FlashAttention2 #4175

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions