π An awesome & curated list of best LLMOps tools.
More than welcome to add a new project by simply opening an issue.
- NVIDIA GPU Operator: NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes.
- KAI Scheduler: KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale.
- Project-HAMi: Heterogeneous AI Computing Virtualization Middleware.
- Cortex.cpp: Local AI API Platform.
- DeepSpeed-MII: MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
- llama-box: LM inference server implementation based on *.cpp.
- Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework.
- ipex-llm: Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
- LMDeploy: LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
- LoRAX: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs.
- llama.cpp: LLM inference in C/C++.
- Llumnix: Efficient and easy multi-instance LLM serving.
- MInference: [NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
- MLC LLM: Universal LLM Deployment Engine with ML Compilation.
- MLServer: An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more.
- Ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.
- OpenLLM: Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
- OpenVINO: OpenVINOβ’ is an open source toolkit for optimizing and deploying AI inference.
- Petals: πΈ Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
- Ratchet: A cross-platform browser ML framework.
- SGLang: SGLang is a fast serving framework for large language models and vision language models.
- TinyGrad: You like pytorch? You like micrograd? You love tinygrad! β€οΈ
- transformers.js: State-of-the-art Machine Learning for the web. Run π€ Transformers directly in your browser, with no need for a server!
- Triton Inference Server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.
- Text Generation Inference: Large Language Model Text Generation Inference.
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs.
- web-llm: High-performance In-browser LLM Inference Engine.
- Xinference: Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
- zml: Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild.
- AIBrix: Cost-efficient and pluggable Infrastructure components for GenAI inference.
- Kaito: Kubernetes operator for large-model inference and fine-tuning, with GPU auto-provisioning, container-based hosting, and CRD-based orchestration.
- Kserve: Standardized Serverless ML Inference Platform on Kubernetes.
- KubeAI: AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
- llm-d: llm-d is a Kubernetes-native high-performance distributed LLM inference framework
- llmaz: βΈοΈ Easy, advanced inference platform for large language models on Kubernetes. π Star to support our work!
- LMCache: 10x Faster Long-Context LLM By Smart KV Cache Optimizations.
- Mooncake: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
- AI Gateway: A blazing fast AI Gateway with integrated guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
- LiteLLM: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq].
- RouteLLM: A framework for serving and evaluating LLM routers - save LLM costs without compromising quality.
- APISIX: The Cloud-Native API Gateway and AI Gateway with extensive plugin system and AI capabilities.
- Envoy AI Gateway: Envoy AI Gateway is an open source project for using Envoy Gateway to handle request traffic from application clients to Generative AI services.
- Higress: π€ AI Gateway | AI Native API Gateway.
- kgateway: The Cloud-Native API Gateway and AI Gateway.
- Kong: π¦ The Cloud-Native API Gateway and AI Gateway.
- gateway-api-inference-extension: Gateway API Inference Extension.
- genai-bench: Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
- Inference Benchmark: A model server agnostic inference benchmarking tool that can be used to benchmark LLMs running on differet infrastructure like GPU and TPU. It can also be run on a GKE cluster as a container.
- Inference Perf: GenAI inference performance benchmarking tool
- Instructor: structured outputs for llms.
- Outlines: Structured Text Generation.
- Dify: Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
- FastGPT: FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.
- Flowise: Drag & drop UI to build your customized LLM flow.
- Haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
- Inference: Turn any computer or edge device into a command center for your computer vision projects.
- Agent Development Kit (ADK): An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
- Agno: Build Multimodal AI Agents with memory, knowledge and tools. Simple, fast and model-agnostic.
- autogen: A programming framework for agentic AI π€ PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
- AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
- CAMEL: π« CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents.
- Codex: Lightweight coding agent that runs in your terminal
- fast-agent: Define, Prompt and Test MCP enabled Agents and Workflows
- Gemini CLI: An open-source AI agent that brings the power of Gemini directly into your terminal.
- kagent: kagent is a kubernetes native framework for building AI agents.
- LangChain: π¦π Build context-aware reasoning applications.
- LangGraph: Build resilient language agents as graphs.
- LlamaIndex: LlamaIndex is the leading framework for building LLM-powered agents over your data.
- Magentic-UI: A research prototype of a human-centered web agent
- MetaGPT: π The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming.
- OpenAI Agents SDK: A lightweight, powerful framework for multi-agent workflows.
- OpenManus: No fortress, purely open ground. OpenManus is Coming.
- PydanticAI: Agent Framework / shim to use Pydantic with LLMs.
- Qwen-Agent: Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
- Semantic Kernel: Integrate cutting-edge LLM technology quickly and easily into your apps.
- Suna: Suna - Open Source Generalist AI Agent
- Swarm: Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
- crewAI: Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
- SWE-agent: SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
- Browser Use: Make websites accessible for AI agents.
- Graphiti: Build Real-Time Knowledge Graphs for AI Agents.
- Mem0: The Memory layer for AI Agents.
- OpenAI CUA: Computer Using Agent Sample App.
- GraphRAG: A modular graph-based Retrieval-Augmented Generation (RAG) system.
- LightRAG: "LightRAG: Simple and Fast Retrieval-Augmented Generation"
- quivr: Opiniated RAG for integrating GenAI in your apps π§ Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
- RAGFlow: RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
- 5ire: 5ire is a cross-platform desktop AI assistant, MCP client. It compatible with major service providers, supports local knowledge base and tools via model context protocol servers.
- AnythingLLM: The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
- Chat SDK: A full-featured, hackable Next.js AI chatbot built by Vercel
- Chatbot UI: AI chat for any model.
- Cherry Studio: π Cherry Studio is a desktop client that supports for multiple LLM providers. Support deepseek-r1.
- FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
- Gradio: Build and share delightful machine learning apps, all in Python. π Star to support our work!
- Jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer.
- kubectl-ai: AI powered Kubernetes Assistant
- LLM: Access large language models from the command-line
- Lobe Chat: π€― Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application.
- NextChat: β¨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows.
- Open WebUI: User-friendly AI Interface (Supports Ollama, OpenAI API, ...).
- PrivateGPT: Interact with your documents using the power of GPT, 100% privately, no data leaks.
- Auto-dev: π§βAutoDev: The AI-powered coding wizardοΌAI ι©±ε¨ηΌη¨ε©ζοΌwith multilingual support π, auto code generation ποΈ, and a helpful bug-slaying assistant π! Customizable prompts π¨ and a magic Auto Dev/Testing/Document/Agent feature π§ͺ included! π.
- Codefuse-chatbot: An intelligent assistant serving the entire software development lifecycle, powered by a Multi-Agent Framework, working with DevOps Toolkits, Code&Doc Repo RAG, etc.
- Cody: Type less, code more: Cody is an AI code assistant that uses advanced search and codebase context to help you write and fix code.
- Continue: β© Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks.
- Sweep: JSweep: AI coding assistant for JetBrains.
- Tabby: Self-hosted AI coding assistant.
- chroma: the AI-native open-source embedding database.
- deeplake: Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow.
- Faiss: A library for efficient similarity search and clustering of dense vectors.
- milvus: Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search.
- weaviate: Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native databaseβ.
- Daytona: Daytona is a Secure and Elastic Infrastructure for Running AI-Generated Code.
- E2B: Secure open source cloud runtime for AI apps & AI agents.
- ragas: Supercharge Your LLM Application Evaluations π
- Langfuse: πͺ’ Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. πYC W23
- OpenLLMetry: Open-source observability for your LLM application, based on OpenTelemetry.
- Helicone: π§ Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 π
- phoenix: AI Observability & Evaluation.
- wandb: The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
- OpenLIT: OpenTelemetry-native LLM Observability, GPU Monitoring
- AXLearn: An Extensible Deep Learning Library
- Candle: Minimalist ML framework for Rust.
- ColossalAI: Making large AI models cheaper, faster and more accessible.
- DLRover: DLRover: An Automatic Distributed Deep Learning System
- Ludwig: Low-code framework for building custom LLMs, neural networks, and other AI models.
- MaxText: A simple, performant and scalable Jax LLM!
- MLX: MLX: An array framework for Apple silicon.
- Axolotl: Go ahead and axolotl questions.
- EasyLM: Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
- LLaMa-Factory: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024).
- LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
- maestro: streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL.
- MLX-VLM: MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
- Swift: Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
- torchtune: PyTorch native post-training library.
- Transformer Lab: Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
- unsloth: Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! π¦₯
- OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).
- Self-RLHF: Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback.
- AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24).
- LiveBench: LiveBench: A Challenging, Contamination-Free LLM Benchmark
- lm-evaluation-harness: A framework for few-shot evaluation of language models.
- LongBench: LongBench v2 and LongBench (ACL 2024).
- OpenCompass: OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
- opik: Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
- BentoML: The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
- Flyte: Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
- Kubeflow: Machine Learning Toolkit for Kubernetes.
- Metaflow: Build, Deploy and Manage AI/ML Systems.
- MLflow: Open source platform for the machine learning lifecycle.
- Polyaxon: MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle.
- Ray: Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
- Seldon-Core: An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models.
- ZenML: ZenML π: The bridge between ML and Ops. https://zenml.io.
- awesome-mcp-servers: A curated list of awesome Model Context Protocol (MCP) servers.
- BaiLian MCP: The full lifecycle MCP service hosted on the BaiLian platform.
- Cline MCP Marketplace: This is the official repository for submitting MCP servers to be included in Cline's MCP Marketplace.
- Docker MCP Catalog: Explore a curated collection of 100+ secure, high-quality MCP servers as Docker Images, spanning database solutions, developer tools, productivity platforms, and API integrations.
- Higress MCP Marketplace: API as MCP, connecting AI with reality less costly, faster, safer.
- mcp-directory: A directory for Awesome MCP Servers.
- MCPMarket: MCPMarket.com has more than 12k+ MCP servers, Explore collection of MCP servers to connect AI to your favorite tools.
- ModelScope MCP: Try online conversations with various MCP Servers hosted on the ModelScope platform.
- Smithery: Smithery is a platform to help developers find and ship language model extensions compatible with the Model Context Protocol Specification.
- awesome-mcp-clients: A curated list of awesome Model Context Protocol (MCP) clients.