Skip to content
@xlite-dev

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

Pinned Loading

  1. lite.ai.toolkit lite.ai.toolkit Public

    🛠 A lite C++ toolkit: Deploy 100+ AI models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, etc) via MNN, ORT and TRT. 🎉🎉

    C++ 4k 737

  2. Awesome-LLM-Inference Awesome-LLM-Inference Public

    📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc. 🎉🎉

    Python 3.8k 267

  3. CUDA-Learn-Notes CUDA-Learn-Notes Public

    📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

    Cuda 3.3k 345

  4. statistic-learning-R-note statistic-learning-R-note Public

    📒统计学习方法-李航: 笔记-从原理到实现, 200-page PDF Notes with detailed explanations of various math formulas, implemented in R.🎉

    444 55

  5. torchlm torchlm Public

    💎A high level pipeline for face landmarks detection: train, eval, inference (Python/C++) and 100+ data augmentations.

    Python 255 24

  6. ffpa-attn-mma ffpa-attn-mma Public

    📚FFPA(Split-D): Yet another Faster Flash Attention with O(1) GPU SRAM complexity large headdim, 1.8x~3x↑🎉 faster than SDPA EA.

    Cuda 161 7

Repositories

Showing 10 of 22 repositories
  • CUDA-Learn-Notes Public

    📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

    xlite-dev/CUDA-Learn-Notes’s past year of commit activity
    Cuda 3,265 GPL-3.0 345 6 0 Updated Apr 6, 2025
  • ffpa-attn-mma Public

    📚FFPA(Split-D): Yet another Faster Flash Attention with O(1) GPU SRAM complexity large headdim, 1.8x~3x↑🎉 faster than SDPA EA.

    xlite-dev/ffpa-attn-mma’s past year of commit activity
    Cuda 161 GPL-3.0 7 2 0 Updated Apr 6, 2025
  • Awesome-LLM-Inference Public

    📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc. 🎉🎉

    xlite-dev/Awesome-LLM-Inference’s past year of commit activity
    Python 3,785 GPL-3.0 267 0 0 Updated Apr 6, 2025
  • xlite-cli Public

    The cli version of lite.ai.toolkit

    xlite-dev/xlite-cli’s past year of commit activity
    C++ 1 0 0 0 Updated Apr 3, 2025
  • hgemm-tensorcores-mma Public

    ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

    xlite-dev/hgemm-tensorcores-mma’s past year of commit activity
    Cuda 66 GPL-3.0 3 0 0 Updated Mar 30, 2025
  • .github Public
    xlite-dev/.github’s past year of commit activity
    1 0 0 0 Updated Mar 30, 2025
  • lite.ai.toolkit Public

    🛠 A lite C++ toolkit: Deploy 100+ AI models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, etc) via MNN, ORT and TRT. 🎉🎉

    xlite-dev/lite.ai.toolkit’s past year of commit activity
    C++ 4,011 GPL-3.0 737 0 0 Updated Mar 29, 2025
  • Awesome-Diffusion-Inference Public

    📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉

    xlite-dev/Awesome-Diffusion-Inference’s past year of commit activity
    206 GPL-3.0 13 0 0 Updated Mar 23, 2025
  • SageAttention Public Forked from thu-ml/SageAttention

    Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

    xlite-dev/SageAttention’s past year of commit activity
    Cuda 0 Apache-2.0 87 0 0 Updated Mar 23, 2025
  • flashinfer Public Forked from flashinfer-ai/flashinfer

    FlashInfer: Kernel Library for LLM Serving

    xlite-dev/flashinfer’s past year of commit activity
    Cuda 0 Apache-2.0 268 0 0 Updated Mar 23, 2025