Skip to content

Conversation

zheliuyu
Copy link
Contributor

@zheliuyu zheliuyu commented Oct 13, 2025

What does this PR do?

To prevent unnecessary downloads by kernels, avoid installing kernels-community/flash-attn and kernels-community/vllm-flash-attn3 when attn_implementation=flash_attention_2 is specified for NPU.

test

script

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B",
    device_map="auto",
    torch_dtype="auto",
    attn_implementation="flash_attention_2",
).eval()
print("Operation successful")

Before

`torch_dtype` is deprecated! Use `dtype` instead!
Fetching 0 files: 0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/root/kernels-main/src/kernels/utils.py", line 144, in install_kernel
    return _load_kernel_from_path(repo_path, package_name, variant_locks)
  File "/root/kernels-main/src/kernels/utils.py", line 177, in _load_kernel_from_path
    raise FileNotFoundError(
FileNotFoundError: Kernel at path `/root/.cache/huggingface/hub/models--kernels-community--flash-attn/snapshots/90b3e941627659b28ff001c08b218315e1b7183b` does not have build: torch27-cxx11-cann81-aarch64-linux

After

`torch_dtype` is deprecated! Use `dtype` instead!
Operation successful

@zheliuyu
Copy link
Contributor Author

@ArthurZucker @MekkCyber This PR is now ready for review.

Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed ! thanks for fixing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants