Skip to content

v2.2.0-windows.post2

Compare
Choose a tag to compare
@woct0rdho woct0rdho released this 10 Aug 04:03
· 56 commits to main since this release

Some improvements are added to the Triton kernels, including better precision of quantized qk ( thu-ml#224 ), and attn_mask ( thu-ml#227 ).

The CUDA kernels for sm75 are removed. In post1 they caused errors when dispatching bf16, see #29 . We may need to build a separate binary for sm75 when we do this again.

RTX 20xx (sm75) is still supported and it will run the Triton kernels.