Releases · woct0rdho/SageAttention

29 Sep 04:52

v2.2.0-windows.post3

3abe607

Latest

(This is not SageAttention3. Currently I cannot make wheels for SageAttention3 that fully work, see #42 (comment) . You can still use the SageAttention2 wheels here.)

Fix GQA case for smooth_k, see thu-ml#252

Previously the SageAttention2++ kernels may not correctly fallback to the old SageAttention2 kernels on RTX 40xx and CUDA < 12.8 . Now it's fixed, see #46

The wheel for PyTorch 2.9 is published. CUDA 13.0 is supported since PyTorch 2.9 . We still need more tests to see if Triton supports CUDA 13.0 .

Assets 8

10 Aug 04:03

woct0rdho

v2.2.0-windows.post2

04c2321

v2.2.0-windows.post2

Some improvements are added to the Triton kernels, including better precision of quantized qk ( thu-ml#224 ), and attn_mask ( thu-ml#227 ).

The CUDA kernels for sm75 are removed. In post1 they caused errors when dispatching bf16, see #29 . We may need to build a separate binary for sm75 when we do this again.

RTX 20xx (sm75) is still supported and it will run the Triton kernels.

Assets 6

02 Jul 15:12

woct0rdho

v2.2.0-windows.post1

d9239b4

v2.2.0-windows.post1

Now the wheels are built with the Python Stable ABI (also known as ABI3). This means we no longer need to build a different wheel for every Python version (although in principle we still need to build for every PyTorch and CUDA version). Later I'll make a PR to the official SageAttention repo.

There is cp39-abi3 in the wheel filenames, so they support Python >= 3.9 .

For RTX 20xx (sm75), this wheel will run the Triton kernels. The CUDA kernels for sm75 are bundled for testing but they do not fully work yet.

The wheel for PyTorch 2.8 is stable now.

Assets 6

01 Jul 09:58

woct0rdho

v2.2.0-windows

5f34a69

v2.2.0-windows

SageAttention2++ kernels (sv_f8_accum_f16) are added. Compared to SageAttention 2.1, they improve the speed with almost no quality loss. They only support RTX 40xx (sm89) and 50xx (sm120) GPUs, and CUDA >= 12.8, therefore PyTorch >= 2.7 .

The SageAttention 2.2 wheels contain both the SageAttention2++ kernels and the old SageAttention2 kernels. On older GPU and CUDA, the old SageAttention2 kernels will be used, so it will still run but without the speedup.

For PyTorch 2.8, the nightly wheels are unstable, so the SageAttention wheels here may not work with the torch nightly wheel on any day. They're only tested with torch 2.8.0.dev20250627 .

Assets 12

25 Mar 02:48

woct0rdho

v2.1.1-windows

b414812

v2.1.1-windows

The instructions to install are moved to the README: https://github.com/woct0rdho/SageAttention . Now the release page only contains changelogs.

For PyTorch 2.8, the nightly wheels are unstable, so the SageAttention wheels here may not work with the torch nightly wheel on any day. They're only tested with torch 2.8.0.dev20250415 .

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: woct0rdho/SageAttention

v2.2.0-windows.post3

Uh oh!

v2.2.0-windows.post2

Uh oh!

v2.2.0-windows.post1

Uh oh!

v2.2.0-windows

Uh oh!

v2.1.1-windows

Uh oh!