Release v2.2.0-windows.post3 · woct0rdho/SageAttention

(This is not SageAttention3. Currently I cannot make wheels for SageAttention3 that fully work, see #42 (comment) . You can still use the SageAttention2 wheels here.)

Fix GQA case for smooth_k, see thu-ml#252

Previously the SageAttention2++ kernels may not correctly fallback to the old SageAttention2 kernels on RTX 40xx and CUDA < 12.8 . Now it's fixed, see #46

The wheel for PyTorch 2.9 is published. CUDA 13.0 is supported since PyTorch 2.9 . We still need more tests to see if Triton supports CUDA 13.0 .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.2.0-windows.post3

Uh oh!