generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Closed
Labels
Description
CI fails for Slow tests: https://github.com/huggingface/trl/actions/runs/18106208226/job/51521315792
ImportError: FlashAttention2: has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
FAILED tests/slow/test_grpo_slow.py::GRPOTrainerSlowTester::test_vlm_training_0_HuggingFaceTB_SmolVLM_Instruct - ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
Traceback:
tests/slow/test_grpo_slow.py:267: in test_vlm_training
model = AutoModelForImageTextToText.from_pretrained(
.venv/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:604: in from_pretrained
return model_class.from_pretrained(
.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:288: in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:5106: in from_pretrained
model = cls(config, *model_args, **model_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.11/site-packages/transformers/models/idefics3/modeling_idefics3.py:840: in __init__
super().__init__(config)
.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:2197: in __init__
self.config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:2807: in _check_and_adjust_attn_implementation
applicable_attn_implementation = self.get_correct_attn_implementation(
.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:2835: in get_correct_attn_implementation
self._flash_attn_2_can_dispatch(is_init_check)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = Idefics3ForConditionalGeneration(), is_init_check = True
def _flash_attn_2_can_dispatch(self, is_init_check: bool = False) -> bool:
"""
Check the availability of Flash Attention 2 for a given model.
Args:
is_init_check (`bool`, *optional*):
Whether this check is performed early, i.e. at __init__ time, or later when the model and its weights are
fully instantiated. This is needed as we also check the devices of the weights, and/or if the model uses
BetterTransformer, which are only available later after __init__. This allows to raise proper exceptions early
before instantiating the full models if we know that the model does not support the requested attention.
"""
dtype = self.config.dtype
# check `supports_flash_attn_2` for BC with custom code. TODO: remove after a few releases
if not (self._supports_flash_attn or getattr(self, "_supports_flash_attn_2", False)):
raise ValueError(
f"{self.__class__.__name__} does not support Flash Attention 2.0 yet. Please request to add support where"
f" the model is hosted, on its model hub page: [https://huggingface.co/{self.config._name_or_path}/discussions/new](https://huggingface.co/%7Bself.config._name_or_path%7D/discussions/new)"
" or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new"
)
if not is_flash_attn_2_available():
preface = "FlashAttention2 has been toggled on, but it cannot be used due to the following error:"
install_message = "Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2."
# package `flash-attn` can not be installed on Ascend NPU, following validation logics can be ignored.
if is_torch_npu_available():
logger.info("Detect using FlashAttention2 on Ascend NPU.")
return True
if importlib.util.find_spec("flash_attn") is None:
> raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
E ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
.venv/lib/python3.11/site-packages/transformers/modeling_utils.py:2547: ImportError