What's Changed
- [Misc] Automated submodule update by @DefTruth in #261
- Update README.md by @tpoisonooo in #264
- Update README.md by @DefTruth in #265
- bugfix: only export per token softmax kernels by @DefTruth in #266
- misc: update vllm latest slides by @DefTruth in #267
- feat: add triton vector_add kernel by @DefTruth in #268
- feat: add triton merge_attn_states kernel by @DefTruth in #269
- feat: add cuda merge_attn_states kernel by @DefTruth in #270
- feat: update cuda merge_attn_states kernel by @DefTruth in #271
- misc: dispatch CUDA merge_attn_states by @DefTruth in #273
- misc: add triton kernel index by @DefTruth in #274
- Fix mistake on mat trans 2d when init grid. by @bear-zd in #275
- misc: update cuda merge_attn_states kernel by @DefTruth in #276
- kernel: optimize merge_attn_states CUDA kernel dispatch by @DefTruth in #278
- feat: optimize merge_attn_states thread block dispatch by @DefTruth in #279
New Contributors
- @tpoisonooo made their first contribution in #264
Full Changelog: v3.0.4...v3.0.5