Releases: woct0rdho/SageAttention
v2.2.0-windows.post3
(This is not SageAttention3. Currently I cannot make wheels for SageAttention3 that fully work, see #42 (comment) . You can still use the SageAttention2 wheels here.)
Fix GQA case for smooth_k
, see thu-ml#252
Previously the SageAttention2++ kernels may not correctly fallback to the old SageAttention2 kernels on RTX 40xx and CUDA < 12.8 . Now it's fixed, see #46
The wheel for PyTorch 2.9 is published. CUDA 13.0 is supported since PyTorch 2.9 . We still need more tests to see if Triton supports CUDA 13.0 .
v2.2.0-windows.post2
Some improvements are added to the Triton kernels, including better precision of quantized qk
( thu-ml#224 ), and attn_mask
( thu-ml#227 ).
The CUDA kernels for sm75 are removed. In post1
they caused errors when dispatching bf16, see #29 . We may need to build a separate binary for sm75 when we do this again.
RTX 20xx (sm75) is still supported and it will run the Triton kernels.
v2.2.0-windows.post1
Now the wheels are built with the Python Stable ABI (also known as ABI3). This means we no longer need to build a different wheel for every Python version (although in principle we still need to build for every PyTorch and CUDA version). Later I'll make a PR to the official SageAttention repo.
There is cp39-abi3
in the wheel filenames, so they support Python >= 3.9 .
For RTX 20xx (sm75), this wheel will run the Triton kernels. The CUDA kernels for sm75 are bundled for testing but they do not fully work yet.
The wheel for PyTorch 2.8 is stable now.
v2.2.0-windows
SageAttention2++ kernels (sv_f8_accum_f16) are added. Compared to SageAttention 2.1, they improve the speed with almost no quality loss. They only support RTX 40xx (sm89) and 50xx (sm120) GPUs, and CUDA >= 12.8, therefore PyTorch >= 2.7 .
The SageAttention 2.2 wheels contain both the SageAttention2++ kernels and the old SageAttention2 kernels. On older GPU and CUDA, the old SageAttention2 kernels will be used, so it will still run but without the speedup.
For PyTorch 2.8, the nightly wheels are unstable, so the SageAttention wheels here may not work with the torch nightly wheel on any day. They're only tested with torch 2.8.0.dev20250627 .
v2.1.1-windows
The instructions to install are moved to the README: https://github.com/woct0rdho/SageAttention . Now the release page only contains changelogs.
For PyTorch 2.8, the nightly wheels are unstable, so the SageAttention wheels here may not work with the torch nightly wheel on any day. They're only tested with torch 2.8.0.dev20250415 .