Skip to content

Conversation

@Aya-ZIbra
Copy link

Summary:
Extends the xformers CUTLASS Blackwell FwOpDecode operator to support BlockDiagonalLocalAttentionPaddedKeysMask, enabling sliding window attention for decode workloads. This builds on the gen kernel sliding window implementation to provide end-to-end local attention support through the xformers API.

Also for consistent shapes, I changed the default split_k_size from 1024 to 0 in the interface to use no split-K sizing.
TODO: Add merge_attentions step to cutlass op when number of splits >1

Differential Revision: D89192917

@meta-codesync
Copy link

meta-codesync bot commented Dec 17, 2025

@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating Diff in D89192917.

Summary:
Extends the xformers CUTLASS Blackwell FwOpDecode operator to support BlockDiagonalLocalAttentionPaddedKeysMask, enabling sliding window attention for decode workloads. This builds on the gen kernel sliding window implementation to provide end-to-end local attention support through the xformers API.

Also for consistent shapes, I changed the default split_k_size from 1024 to 0 in the interface to use no split-K sizing.
TODO: Add merge_attentions step to cutlass op when number of splits >1

Differential Revision: D89192917
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant