·
42 commits
to main
since this release
(This is not SageAttention3. Currently I cannot make wheels for SageAttention3 that fully work, see #42 (comment) . You can still use the SageAttention2 wheels here.)
Fix GQA case for smooth_k
, see thu-ml#252
Previously the SageAttention2++ kernels may not correctly fallback to the old SageAttention2 kernels on RTX 40xx and CUDA < 12.8 . Now it's fixed, see #46
The wheel for PyTorch 2.9 is published. CUDA 13.0 is supported since PyTorch 2.9 . We still need more tests to see if Triton supports CUDA 13.0 .