Skip to content
Change the repository type filter

All

    Repositories list

    • ci-infra

      Public
      This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
      HCL
      5327026Updated Dec 28, 2025Dec 28, 2025
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      12k66k1.8k1.3kUpdated Dec 28, 2025Dec 28, 2025
    • vllm-ascend

      Public
      Community maintained hardware plugin for vLLM on Ascend
      Python
      6841.5k817295Updated Dec 28, 2025Dec 28, 2025
    • vllm-metal

      Public
      Community maintained hardware plugin for vLLM on Apple Silicon
      Python
      1212843Updated Dec 28, 2025Dec 28, 2025
    • guidellm

      Public
      Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
      Python
      1087694619Updated Dec 28, 2025Dec 28, 2025
    • tpu-inference

      Public
      TPU inference for vLLM, with unified JAX and PyTorch support.
      Python
      642021679Updated Dec 27, 2025Dec 27, 2025
    • vllm-omni

      Public
      A framework for efficient model inference with omni-modality models
      Python
      2251.8k10441Updated Dec 27, 2025Dec 27, 2025
    • vllm-project.github.io

      Public
      JavaScript
      572710Updated Dec 27, 2025Dec 27, 2025
    • vllm-daily

      Public
      vLLM Daily Summarization of Merged PRs
      12200Updated Dec 27, 2025Dec 27, 2025
    • semantic-router

      Public
      System Level Intelligent Router for Mixture-of-Models
      Go
      3682.6k9538Updated Dec 27, 2025Dec 27, 2025
    • vllm-xpu-kernels

      Public
      The vLLM XPU kernels for Intel GPU
      C++
      161413Updated Dec 26, 2025Dec 26, 2025
    • vllm-spyre

      Public
      Community maintained hardware plugin for vLLM on Spyre
      Python
      3138413Updated Dec 26, 2025Dec 26, 2025
    • router

      Public
      A high-performance and light-weight router for vLLM large scale deployment
      Rust
      136644Updated Dec 26, 2025Dec 26, 2025
    • vllm-gaudi

      Public
      Community maintained hardware plugin for vLLM on Intel Gaudi
      Python
      8721168Updated Dec 24, 2025Dec 24, 2025
    • vllm-neuron

      Public
      Community maintained hardware plugin for vLLM on AWS Neuron
      Python
      41621Updated Dec 23, 2025Dec 23, 2025
    • recipes

      Public
      Common recipes to run vLLM
      Jupyter Notebook
      112306930Updated Dec 23, 2025Dec 23, 2025
    • compressed-tensors

      Public
      A safetensors extension to efficiently store sparse quantized tensors on disk
      Python
      48225317Updated Dec 23, 2025Dec 23, 2025
    • aibrix

      Public
      Cost-efficient and pluggable Infrastructure components for GenAI inference
      Go
      5034.5k26829Updated Dec 23, 2025Dec 23, 2025
    • llm-compressor

      Public
      Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
      Python
      3402.5k7549Updated Dec 22, 2025Dec 22, 2025
    • production-stack

      Public
      vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
      Python
      3442.1k9255Updated Dec 20, 2025Dec 20, 2025
    • speculators

      Public
      A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
      Python
      22176711Updated Dec 19, 2025Dec 19, 2025
    • flash-attention

      Public
      Fast and memory-efficient exact attention
      Python
      2.3k105016Updated Dec 17, 2025Dec 17, 2025
    • FlashMLA

      Public
      C++
      923903Updated Dec 15, 2025Dec 15, 2025
    • 11100Updated Dec 14, 2025Dec 14, 2025
    • Python
      112920Updated Dec 4, 2025Dec 4, 2025
    • media-kit

      Public
      vLLM Logo Assets
      4600Updated Oct 22, 2025Oct 22, 2025
    • DeepGEMM

      Public
      DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
      Cuda
      784000Updated Sep 29, 2025Sep 29, 2025
    • rfcs

      Public
      0100Updated Jun 3, 2025Jun 3, 2025
    • vllm-project.github.io-static

      Public archive
      HTML
      7901Updated Feb 7, 2025Feb 7, 2025
    • vllm-nccl

      Public archive
      Manages vllm-nccl dependency
      Python
      31720Updated Jun 3, 2024Jun 3, 2024