Skip to content

Commit

Permalink
Update Readme.md
Browse files Browse the repository at this point in the history
Signed-off-by: kerthcet <[email protected]>
  • Loading branch information
kerthcet committed Jul 13, 2024
1 parent 38022f3 commit 1d73e73
Show file tree
Hide file tree
Showing 5 changed files with 47 additions and 1 deletion.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ Dockerfile.cross
*.swp
*.swo
*~
.DS_Store
47 changes: 46 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,48 @@
# llmaz

☸️ Effortlessly operating LLMs on Kubernetes, e.g. Serving.
[![stability-wip](https://img.shields.io/badge/stability-wip-lightgrey.svg)](https://github.com/mkenney/software-guides/blob/master/STABILITY-BADGES.md#work-in-progress)
[![GoReport Widget]][GoReport Status]
[![Latest Release](https://img.shields.io/github/v/release/inftyai/llmaz?include_prereleases)](https://github.com/inftyai/llmaz/releases/latest)

[GoReport Widget]: https://goreportcard.com/badge/github.com/inftyai/llmaz
[GoReport Status]: https://goreportcard.com/report/github.com/inftyai/llmaz

llmaz, pronounced as `/lima:z/`, aims to provide a production-ready inference platform for various LLMs on Kubernetes. It tightly integrates with state-of-the-art inference backends, such as [vLLM](https://github.com/vllm-project/vllm).

## Concept

![image](./docs/assets/overview.png)

## Feature Overview

- **Easy to use**: People can deploy a production-ready LLM service with minimal configurations.
- **High performance**: llmaz integrates with vLLM by default for high performance inference. Other backend supports are on the way.
- **Autoscaling efficiently**: llmaz works smoothly with autoscaling components like [cluster-autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) and [Karpenter](https://github.com/kubernetes-sigs/karpenter) to support elastic scenarios.
- **Model as the first citizen**: Cloud providers or model providers can manage the models breezily.
- **Accelerator fungibility**: llmaz supports serving LLMs with different accelerators for the sake of cost and performance.
- **SOTA inference technologies**: llmaz support the latest SOTA technologies like [speculative decoding](https://arxiv.org/abs/2211.17192) and [Splitwise](https://arxiv.org/abs/2311.18677).

## Quick Start

Refer to the [samples](/config/samples/) for quick deployment.

## Roadmap

- Metrics support
- Autoscaling support
- Gateway support
- Serverless support
- CLI tool
- Model training, fine tuning in the long-term.

## Contributions

🚀 All kinds of contributions are welcomed ! Please follow [Contributing](https://github.com/InftyAI/community/blob/main/CONTRIBUTING.md).

## Contributors

🎉 Thanks to all these contributors.

<a href="https://github.com/InftyAI/llmaz/graphs/contributors">
<img src="https://contrib.rocks/image?repo=InftyAI/llmaz" />
</a>
Binary file added docs/.DS_Store
Binary file not shown.
Binary file added docs/assets/.DS_Store
Binary file not shown.
Binary file added docs/assets/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1d73e73

Please sign in to comment.