Awesome-LLMOps

🎉 An awesome & curated list of best LLMOps tools.

More than welcome to add a new project by simply opening an issue.

GPU

Management

NVIDIA GPU Operator: NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes.

Scheduling

KAI Scheduler: KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale.
Project-HAMi: Heterogeneous AI Computing Virtualization Middleware.

Inference

Inference Engine

Cortex.cpp: Local AI API Platform.
DeepSpeed-MII: MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
llama-box: LM inference server implementation based on *.cpp.
Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework.
ipex-llm: Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
LMDeploy: LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
LoRAX: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs.
llama.cpp: LLM inference in C/C++.
Llumnix: Efficient and easy multi-instance LLM serving.
MInference: [NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
MLC LLM: Universal LLM Deployment Engine with ML Compilation.
MLServer: An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more.
Ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.
OpenLLM: Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
OpenVINO: OpenVINO™ is an open source toolkit for optimizing and deploying AI inference.
Petals: 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Ratchet: A cross-platform browser ML framework.
SGLang: SGLang is a fast serving framework for large language models and vision language models.
TinyGrad: You like pytorch? You like micrograd? You love tinygrad! ❤️
transformers.js: State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
Triton Inference Server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Text Generation Inference: Large Language Model Text Generation Inference.
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs.
web-llm: High-performance In-browser LLM Inference Engine.
Xinference: Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
zml: Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild.

Inference Platform

AIBrix: Cost-efficient and pluggable Infrastructure components for GenAI inference.
Kaito: Kubernetes operator for large-model inference and fine-tuning, with GPU auto-provisioning, container-based hosting, and CRD-based orchestration.
Kserve: Standardized Serverless ML Inference Platform on Kubernetes.
KubeAI: AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
llm-d: llm-d is a Kubernetes-native high-performance distributed LLM inference framework
llmaz: ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
LMCache: 10x Faster Long-Context LLM By Smart KV Cache Optimizations.
Mooncake: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

LLM Router

AI Gateway: A blazing fast AI Gateway with integrated guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
LiteLLM: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq].
RouteLLM: A framework for serving and evaluating LLM routers - save LLM costs without compromising quality.

AI Gateway

APISIX: The Cloud-Native API Gateway and AI Gateway with extensive plugin system and AI capabilities.
Envoy AI Gateway: Envoy AI Gateway is an open source project for using Envoy Gateway to handle request traffic from application clients to Generative AI services.
Higress: 🤖 AI Gateway | AI Native API Gateway.
kgateway: The Cloud-Native API Gateway and AI Gateway.
Kong: 🦍 The Cloud-Native API Gateway and AI Gateway.
gateway-api-inference-extension: Gateway API Inference Extension.

Benchmark

genai-bench: Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
Inference Benchmark: A model server agnostic inference benchmarking tool that can be used to benchmark LLMs running on differet infrastructure like GPU and TPU. It can also be run on a GKE cluster as a container.
Inference Perf: GenAI inference performance benchmarking tool

Output

Instructor: structured outputs for llms.
Outlines: Structured Text Generation.

Orchestration

Workflow

Dify: Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
FastGPT: FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.
Flowise: Drag & drop UI to build your customized LLM flow.
Haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Inference: Turn any computer or edge device into a command center for your computer vision projects.

Agent

Agent Development Kit (ADK): An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
Agno: Build Multimodal AI Agents with memory, knowledge and tools. Simple, fast and model-agnostic.
autogen: A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
CAMEL: 🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents.
Codex: Lightweight coding agent that runs in your terminal
fast-agent: Define, Prompt and Test MCP enabled Agents and Workflows
Gemini CLI: An open-source AI agent that brings the power of Gemini directly into your terminal.
kagent: kagent is a kubernetes native framework for building AI agents.
LangChain: 🦜🔗 Build context-aware reasoning applications.
LangGraph: Build resilient language agents as graphs.
LlamaIndex: LlamaIndex is the leading framework for building LLM-powered agents over your data.
Magentic-UI: A research prototype of a human-centered web agent
MetaGPT: 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming.
OpenAI Agents SDK: A lightweight, powerful framework for multi-agent workflows.
OpenManus: No fortress, purely open ground. OpenManus is Coming.
PydanticAI: Agent Framework / shim to use Pydantic with LLMs.
Qwen-Agent: Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Semantic Kernel: Integrate cutting-edge LLM technology quickly and easily into your apps.
Suna: Suna - Open Source Generalist AI Agent
Swarm: Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
crewAI: Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
SWE-agent: SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]

Tools

Browser Use: Make websites accessible for AI agents.
Graphiti: Build Real-Time Knowledge Graphs for AI Agents.
Mem0: The Memory layer for AI Agents.
OpenAI CUA: Computer Using Agent Sample App.

RAG

GraphRAG: A modular graph-based Retrieval-Augmented Generation (RAG) system.
LightRAG: "LightRAG: Simple and Fast Retrieval-Augmented Generation"
quivr: Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
RAGFlow: RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Runtime

Chatbot

5ire: 5ire is a cross-platform desktop AI assistant, MCP client. It compatible with major service providers, supports local knowledge base and tools via model context protocol servers.
AnythingLLM: The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
Chat SDK: A full-featured, hackable Next.js AI chatbot built by Vercel
Chatbot UI: AI chat for any model.
Cherry Studio: 🍒 Cherry Studio is a desktop client that supports for multiple LLM providers. Support deepseek-r1.
FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Gradio: Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer.
kubectl-ai: AI powered Kubernetes Assistant
LLM: Access large language models from the command-line
Lobe Chat: 🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application.
NextChat: ✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows.
Open WebUI: User-friendly AI Interface (Supports Ollama, OpenAI API, ...).
PrivateGPT: Interact with your documents using the power of GPT, 100% privately, no data leaks.

Code Assistant

Auto-dev: 🧙‍AutoDev: The AI-powered coding wizard（AI 驱动编程助手）with multilingual support 🌐, auto code generation 🏗️, and a helpful bug-slaying assistant 🐞! Customizable prompts 🎨 and a magic Auto Dev/Testing/Document/Agent feature 🧪 included! 🚀.
Codefuse-chatbot: An intelligent assistant serving the entire software development lifecycle, powered by a Multi-Agent Framework, working with DevOps Toolkits, Code&Doc Repo RAG, etc.
Cody: Type less, code more: Cody is an AI code assistant that uses advanced search and codebase context to help you write and fix code.
Continue: ⏩ Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks.
Sweep: JSweep: AI coding assistant for JetBrains.
Tabby: Self-hosted AI coding assistant.

Database

chroma: the AI-native open-source embedding database.
deeplake: Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow.
Faiss: A library for efficient similarity search and clustering of dense vectors.
milvus: Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search.
weaviate: Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

Development Environment

Daytona: Daytona is a Secure and Elastic Infrastructure for Running AI-Generated Code.
E2B: Secure open source cloud runtime for AI apps & AI agents.

Evaluation

ragas: Supercharge Your LLM Application Evaluations 🚀

Observation

Langfuse: 🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
OpenLLMetry: Open-source observability for your LLM application, based on OpenTelemetry.
Helicone: 🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
phoenix: AI Observability & Evaluation.
wandb: The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
OpenLIT: OpenTelemetry-native LLM Observability, GPU Monitoring

Training

Framework

AXLearn: An Extensible Deep Learning Library
Candle: Minimalist ML framework for Rust.
ColossalAI: Making large AI models cheaper, faster and more accessible.
DLRover: DLRover: An Automatic Distributed Deep Learning System
Ludwig: Low-code framework for building custom LLMs, neural networks, and other AI models.
MaxText: A simple, performant and scalable Jax LLM!
MLX: MLX: An array framework for Apple silicon.

FineTune

Axolotl: Go ahead and axolotl questions.
EasyLM: Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
LLaMa-Factory: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024).
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
maestro: streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL.
MLX-VLM: MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Swift: Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
torchtune: PyTorch native post-training library.
Transformer Lab: Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
unsloth: Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥

Alignment

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).
Self-RLHF: Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback.

Evaluation

AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24).
LiveBench: LiveBench: A Challenging, Contamination-Free LLM Benchmark
lm-evaluation-harness: A framework for few-shot evaluation of language models.
LongBench: LongBench v2 and LongBench (ACL 2024).
OpenCompass: OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
opik: Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Workflow

BentoML: The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Flyte: Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
Kubeflow: Machine Learning Toolkit for Kubernetes.
Metaflow: Build, Deploy and Manage AI/ML Systems.
MLflow: Open source platform for the machine learning lifecycle.
Polyaxon: MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle.
Ray: Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Seldon-Core: An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models.
ZenML: ZenML 🙏: The bridge between ML and Ops. https://zenml.io.

MCP

MCP Server

awesome-mcp-servers: A curated list of awesome Model Context Protocol (MCP) servers.
BaiLian MCP: The full lifecycle MCP service hosted on the BaiLian platform.
Cline MCP Marketplace: This is the official repository for submitting MCP servers to be included in Cline's MCP Marketplace.
Docker MCP Catalog: Explore a curated collection of 100+ secure, high-quality MCP servers as Docker Images, spanning database solutions, developer tools, productivity platforms, and API integrations.
Higress MCP Marketplace: API as MCP, connecting AI with reality less costly, faster, safer.
mcp-directory: A directory for Awesome MCP Servers.
MCPMarket: MCPMarket.com has more than 12k+ MCP servers, Explore collection of MCP servers to connect AI to your favorite tools.
ModelScope MCP: Try online conversations with various MCP Servers hosted on the ModelScope platform.
Smithery: Smithery is a platform to help developers find and ship language model extensions compatible with the Model Context Protocol Specification.

MCP Client

awesome-mcp-clients: A curated list of awesome Model Context Protocol (MCP) clients.

Name		Name	Last commit message	Last commit date
Latest commit History 315 Commits
.github		.github
website		website
.gitignore		.gitignore
CNAME		CNAME
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
README.md		README.md
project_request.py		project_request.py
requirements.txt		requirements.txt

Uh oh!

License

InftyAI/Awesome-LLMOps

Folders and files

Latest commit

History

Repository files navigation

Awesome-LLMOps

Table of Contents

GPU

Management

Scheduling

Inference

Inference Engine

Inference Platform

LLM Router

AI Gateway

Benchmark

Output

Orchestration

Workflow

Agent

Tools

RAG

Runtime

Chatbot

Code Assistant

Database

Development Environment

Evaluation

Observation

Training

Framework

FineTune

Alignment

Evaluation

Workflow

MCP

MCP Server

MCP Client

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Uh oh!

Contributors 15

Uh oh!

Languages