Skip to content

Navigation Menu

xlite-dev

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Sign up

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

43 followers
China
https://github.com/xlite-dev

Overview
Repositories
Projects
Packages
People

More

Overview
Repositories
Projects
Packages
People

README.md

🛠Creator @DefTruth | Main Contributor @wangzijian1010 | All Team Members📚

Pinned Loading

lite.ai.toolkit lite.ai.toolkit Public

🛠 A lite C++ toolkit: Deploy 100+ AI models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, etc) via MNN, ORT and TRT. 🎉🎉

C++ 4k 737
Awesome-LLM-Inference Awesome-LLM-Inference Public

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc. 🎉🎉

Python 3.8k 267
CUDA-Learn-Notes CUDA-Learn-Notes Public

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

Cuda 3.3k 345
statistic-learning-R-note statistic-learning-R-note Public

📒统计学习方法-李航: 笔记-从原理到实现, 200-page PDF Notes with detailed explanations of various math formulas, implemented in R.🎉

444 55
torchlm torchlm Public

💎A high level pipeline for face landmarks detection: train, eval, inference (Python/C++) and 100+ data augmentations.

Python 255 24
ffpa-attn-mma ffpa-attn-mma Public

📚FFPA(Split-D): Yet another Faster Flash Attention with O(1) GPU SRAM complexity large headdim, 1.8x~3x↑🎉 faster than SDPA EA.

Cuda 161 7

Repositories

Loading

Type

Select type

All Public Sources Forks Archived Mirrors Templates

Language

Select language

All C++ Cuda Python TypeScript

Sort

Select order

Last updated Name Stars

Showing 10 of 22 repositories

CUDA-Learn-Notes Public
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

xlite-dev/CUDA-Learn-Notes’s past year of commit activity

Cuda 3,265 GPL-3.0 345 6 0 Updated Apr 6, 2025
ffpa-attn-mma Public
📚FFPA(Split-D): Yet another Faster Flash Attention with O(1) GPU SRAM complexity large headdim, 1.8x~3x↑🎉 faster than SDPA EA.

xlite-dev/ffpa-attn-mma’s past year of commit activity

Cuda 161 GPL-3.0 7 2 0 Updated Apr 6, 2025
Awesome-LLM-Inference Public
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc. 🎉🎉

xlite-dev/Awesome-LLM-Inference’s past year of commit activity

Python 3,785 GPL-3.0 267 0 0 Updated Apr 6, 2025
xlite-cli Public
The cli version of lite.ai.toolkit

xlite-dev/xlite-cli’s past year of commit activity

C++ 1 0 0 0 Updated Apr 3, 2025
hgemm-tensorcores-mma Public
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

xlite-dev/hgemm-tensorcores-mma’s past year of commit activity

Cuda 66 GPL-3.0 3 0 0 Updated Mar 30, 2025
.github Public

xlite-dev/.github’s past year of commit activity

1 0 0 0 Updated Mar 30, 2025
lite.ai.toolkit Public
🛠 A lite C++ toolkit: Deploy 100+ AI models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, etc) via MNN, ORT and TRT. 🎉🎉

xlite-dev/lite.ai.toolkit’s past year of commit activity

C++ 4,011 GPL-3.0 737 0 0 Updated Mar 29, 2025
Awesome-Diffusion-Inference Public
📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉

xlite-dev/Awesome-Diffusion-Inference’s past year of commit activity

206 GPL-3.0 13 0 0 Updated Mar 23, 2025
SageAttention Public Forked from thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

xlite-dev/SageAttention’s past year of commit activity

Cuda 0 Apache-2.0 87 0 0 Updated Mar 23, 2025
flashinfer Public Forked from flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving

xlite-dev/flashinfer’s past year of commit activity

Cuda 0 Apache-2.0 268 0 0 Updated Mar 23, 2025

View all repositories

People

Top languages

C++ Cuda Python TypeScript

Most used topics

mnn onnxruntime tnn ncnn cpp

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.