-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: kerthcet <[email protected]>
- Loading branch information
Showing
6 changed files
with
47 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,3 +24,4 @@ Dockerfile.cross | |
*.swp | ||
*.swo | ||
*~ | ||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,48 @@ | ||
# llmaz | ||
|
||
☸️ Effortlessly operating LLMs on Kubernetes, e.g. Serving. | ||
[![stability-wip](https://img.shields.io/badge/stability-wip-lightgrey.svg)](https://github.com/mkenney/software-guides/blob/master/STABILITY-BADGES.md#work-in-progress) | ||
[![GoReport Widget]][GoReport Status] | ||
[![Latest Release](https://img.shields.io/github/v/release/inftyai/llmaz?include_prereleases)](https://github.com/inftyai/llmaz/releases/latest) | ||
|
||
[GoReport Widget]: https://goreportcard.com/badge/github.com/inftyai/llmaz | ||
[GoReport Status]: https://goreportcard.com/report/github.com/inftyai/llmaz | ||
|
||
llmaz, pronounced as `/lima:z/`, aims to provide a production-ready inference platform for various LLMs on Kubernetes. It tightly integrates with state-of-the-art inference backends, such as [vLLM](https://github.com/vllm-project/vllm). | ||
|
||
## Concept | ||
|
||
![image](./docs/assets/overview.png) | ||
|
||
## Feature Overview | ||
|
||
- **Easy to use**: People can deploy a production-ready LLM service with minimal configurations. | ||
- **High performance**: llmaz integrates with vLLM by default for high performance inference. Other backend supports are on the way. | ||
- **Autoscaling efficiently**: llmaz works smoothly with autoscaling components like [cluster-autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) and [Karpenter](https://github.com/kubernetes-sigs/karpenter) to support elastic scenarios. | ||
- **Model as the first citizen**: Cloud providers or model providers can manage the models breezily. | ||
- **Accelerator fungibility**: llmaz supports serving LLMs with different accelerators for the sake of cost and performance. | ||
- **SOTA inference technologies**: llmaz support the latest SOTA technologies like [speculative decoding](https://arxiv.org/abs/2211.17192) and [Splitwise](https://arxiv.org/abs/2311.18677). | ||
|
||
## Quick Start | ||
|
||
Refer to the [samples](/config/samples/) for quick deployment. | ||
|
||
## Roadmap | ||
|
||
- Metrics support | ||
- Autoscaling support | ||
- Gateway support | ||
- Serverless support | ||
- CLI tool | ||
- Model training, fine tuning in the long-term. | ||
|
||
## Contributions | ||
|
||
🚀 All kinds of contributions are welcomed ! Please follow [Contributing](https://github.com/InftyAI/community/blob/main/CONTRIBUTING.md). | ||
|
||
## Contributors | ||
|
||
🎉 Thanks to all these contributors. | ||
|
||
<a href="https://github.com/InftyAI/llmaz/graphs/contributors"> | ||
<img src="https://contrib.rocks/image?repo=InftyAI/llmaz" /> | ||
</a> |
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.