flash-attn: Add XPU support for flash-attn-2 #30

YangKai0616 · 2025-10-13T11:57:45Z

The main contribution of this PR:

Implement flash_attn.fwd and flash_attn.varlen_fwd on XPU.
Modify test_flash_attn.py to support XPU testing.

This PR passed local pip install, nix compilation, and UT testing on the XPU. It also passed pip install and UT testing on the CUDA GPU.

Additional:
The test of test_flash_attn.py::test_flash_attn_kvcache on CUDA reports an error: RuntimeError: out must have shape (batch_size, seqlen_q, num_heads, head_size_og). I tested the original kernel flash-attn and got the same error, which is unrelated to this PR.

YangKai0616 · 2025-10-13T12:12:44Z

@danieldk please help review, thanks!

danieldk · 2025-10-13T16:51:01Z

CUDA flash-attn2 is merged: https://github.com/huggingface/kernels-community/tree/main/flash-attn2 Could you rebase the PR on main?

YangKai0616 added 6 commits October 13, 2025 06:47

Add XPU support for flash-attn 2

41900f9

Update

61adef3

Update compat

4922936

Fix bug

7267782

Fix bug

b9517f5

Fixed bug

3c6ef10

Compatible with old sycl versions

a7065c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

flash-attn: Add XPU support for flash-attn-2 #30

flash-attn: Add XPU support for flash-attn-2 #30

YangKai0616 commented Oct 13, 2025

Uh oh!

YangKai0616 commented Oct 13, 2025

Uh oh!

danieldk commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flash-attn: Add XPU support for flash-attn-2 #30

Are you sure you want to change the base?

flash-attn: Add XPU support for flash-attn-2 #30

Conversation

YangKai0616 commented Oct 13, 2025

Uh oh!

YangKai0616 commented Oct 13, 2025

Uh oh!

danieldk commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants