Skip to content

v2.2.0-windows.post3

Latest
Compare
Choose a tag to compare
@woct0rdho woct0rdho released this 29 Sep 04:52
· 42 commits to main since this release

(This is not SageAttention3. Currently I cannot make wheels for SageAttention3 that fully work, see #42 (comment) . You can still use the SageAttention2 wheels here.)

Fix GQA case for smooth_k, see thu-ml#252

Previously the SageAttention2++ kernels may not correctly fallback to the old SageAttention2 kernels on RTX 40xx and CUDA < 12.8 . Now it's fixed, see #46

The wheel for PyTorch 2.9 is published. CUDA 13.0 is supported since PyTorch 2.9 . We still need more tests to see if Triton supports CUDA 13.0 .