v2.2.0-windows.post2

woct0rdho released this 10 Aug 04:03

· 56 commits to main since this release

04c2321

Some improvements are added to the Triton kernels, including better precision of quantized qk ( thu-ml#224 ), and attn_mask ( thu-ml#227 ).

The CUDA kernels for sm75 are removed. In post1 they caused errors when dispatching bf16, see #29 . We may need to build a separate binary for sm75 when we do this again.

RTX 20xx (sm75) is still supported and it will run the Triton kernels.

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.2.0-windows.post2

Uh oh!