Releases: xlite-dev/Awesome-LLM-Inference
Releases · xlite-dev/Awesome-LLM-Inference
v2.6.20
What's Changed
- Add 4 papers by @woominsong in #148
- Update new paper (KVzip) by @Janghyun1230 in #149
- Add a new paper (GuidedQuant) by @jusjinuk in #150
- Add STAND by @woominsong in #151
- Add Inference-Time Hyper-Scaling by @CStanKonrad in #152
New Contributors
- @woominsong made their first contribution in #148
- @jusjinuk made their first contribution in #150
- @CStanKonrad made their first contribution in #152
Full Changelog: v2.6.19...v2.6.20
v2.6.19
v2.6.18
v2.6.17
What's Changed
- 🔥[BitNet v2] Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs by @DefTruth in #144
- Add The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs by @PiotrNawrot in #145
New Contributors
- @PiotrNawrot made their first contribution in #145
Full Changelog: v2.6.16...v2.6.17
v2.6.16
What's Changed
- Add PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters by @Lizonghang in #137
- 🔥🔥[SGLang] Efficiently Programming Large Language Models using SGLang by @DefTruth in #138
- 🔥[FSDP 1/2] PyTorch FSDP: Getting Started with Fully Sharded Data Parallel(FSDP) by @DefTruth in #139
- 🔥[MMInference] MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention by @DefTruth in #140
- Update Multi-GPUs/Multi-Nodes Parallelism by @DefTruth in #141
- 🔥[Triton-distributed] TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives by @DefTruth in #142
New Contributors
- @Lizonghang made their first contribution in #137
Full Changelog: v2.6.15...v2.6.16
v2.6.15
What's Changed
- MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism by @DefTruth in #131
- TRITONBENCH: Benchmarking Large Language Model Capabilities for Generating Triton Operator by @DefTruth in #132
- 🔥[KV Cache Prefetch] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching by @DefTruth in #133
- Add SeerAttention and SlimAttention Paper by @sunshinemyson in #135
New Contributors
- @sunshinemyson made their first contribution in #135
Full Changelog: v2.6.14...v2.6.15
v2.6.14
What's Changed
- [feat] add deepseek FlashMLA by @shaoyuyoung in #120
- Add our ICLR2025 work Dynamic-LLaVA by @Blank-z0 in #121
- 🔥[MHA2MLA] Towards Economical Inference: Enabling DeepSeek’s Multi-Head Latent Attention in Any Transformer-based LLMs by @DefTruth in #122
- update the title of SageAttention2 and add SpargeAttn by @jt-zhang in #123
- Add DeepSeek Open Sources modules by @DefTruth in #124
- Update DeepSeek/MLA Topics by @DefTruth in #125
- Request to Add CacheCraft: A Relevant Work on Chunk-Aware KV Cache Reuse by @skejriwal44 in #126
- 🔥[X-EcoMLA] Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression by @DefTruth in #127
- Add download_pdfs.py by @DefTruth in #128
- Update README.md by @DefTruth in #129
- Update Mooncake-v3 paper link by @DefTruth in #130
New Contributors
- @Blank-z0 made their first contribution in #121
- @jt-zhang made their first contribution in #123
- @skejriwal44 made their first contribution in #126
Full Changelog: v2.6.13...v2.6.14
v2.6.13
What's Changed
- 🔥[DeepSeek-NSA] Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/119
Full Changelog: DefTruth/Awesome-LLM-Inference@v2.6.12...v2.6.13
v2.6.12
What's Changed
- Add Multi-head Latent Attention(MLA) topic by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/118
Full Changelog: DefTruth/Awesome-LLM-Inference@v2.6.11...v2.6.12
v2.6.11
What's Changed
- add
MiniMax-01in Trending LLM/VLM Topics and Long Context Attention by @shaoyuyoung in https://github.com/DefTruth/Awesome-LLM-Inference/pull/112 - [feat] add deepseek-r1 by @shaoyuyoung in https://github.com/DefTruth/Awesome-LLM-Inference/pull/113
- 🔥🔥[DistServe] DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/114
- 🔥🔥[KVDirect] KVDirect: Distributed Disaggregated LLM Inference by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/115
- 🔥🔥[DeServe] DESERVE: TOWARDS AFFORDABLE OFFLINE LLM INFERENCE VIA DECENTRALIZATION by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/116
- 🔥🔥[Mooncake] Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving by @DefTruth in https://github.com/DefTruth/Awesome-LLM-Inference/pull/117
New Contributors
- @shaoyuyoung made their first contribution in https://github.com/DefTruth/Awesome-LLM-Inference/pull/112
Full Changelog: DefTruth/Awesome-LLM-Inference@v2.6.10...v2.6.11