v2.2.0-windows.post2
·
56 commits
to main
since this release
Some improvements are added to the Triton kernels, including better precision of quantized qk
( thu-ml#224 ), and attn_mask
( thu-ml#227 ).
The CUDA kernels for sm75 are removed. In post1
they caused errors when dispatching bf16, see #29 . We may need to build a separate binary for sm75 when we do this again.
RTX 20xx (sm75) is still supported and it will run the Triton kernels.