Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
Signed-off-by: kerthcet <[email protected]>
  • Loading branch information
kerthcet committed Nov 12, 2024
1 parent 70f9da0 commit 2e7d44d
Show file tree
Hide file tree
Showing 5 changed files with 10 additions and 4 deletions.
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,22 @@ Easy, advanced inference platform for large language models on Kubernetes
## Architecture

![image](./docs/assets/arch.png)
<p align="center">
<picture>
<img alt="architecture" src="https://raw.githubusercontent.com/inftyai/llmaz/main/docs/assets/arch.png" width=100%>
</picture>
</p>

## Features Overview

- **Easy of Use**: People can quick deploy a LLM service with minimal configurations.
- **Broad Backend Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [Text-Generation-Inference](https://github.com/huggingface/text-generation-inference), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp), [ollama](https://github.com/ollama/ollama). Find the full list of supported backends [here](./docs/support-backends.md).
- **Broad Backend Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [Text-Generation-Inference](https://github.com/huggingface/text-generation-inference), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp). Find the full list of supported backends [here](./docs/support-backends.md).
- **Model Distribution**: Out-of-the-box model cache system with [Manta](https://github.com/InftyAI/Manta).
- **Scaling Efficiency (WIP)**: llmaz works smoothly with autoscaling components like [Cluster-Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) or [Karpenter](https://github.com/kubernetes-sigs/karpenter) to support elastic scenarios.
- **Accelerator Fungibility (WIP)**: llmaz supports serving the same LLM with various accelerators to optimize cost and performance.
- **SOTA Inference**: llmaz supports the latest cutting-edge researches like [Speculative Decoding](https://arxiv.org/abs/2211.17192) or [Splitwise](https://arxiv.org/abs/2311.18677)(WIP) to run on Kubernetes.
- **Various Model Providers**: llmaz supports a wide range of model providers, such as [HuggingFace](https://huggingface.co/), [ModelScope](https://www.modelscope.cn), ObjectStores(aliyun OSS, more on the way). llmaz automatically handles the model loading requiring no effort from users.
- **Multi-hosts Support**: llmaz supports both single-host and multi-hosts scenarios with [LWS](https://github.com/kubernetes-sigs/lws) from day 1.
- **Multi-hosts Support**: llmaz supports both single-host and multi-hosts scenarios with [LWS](https://github.com/kubernetes-sigs/lws) from day 0.

## Quick Start

Expand Down
Binary file modified docs/assets/arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/assets/overview.png
Binary file not shown.
1 change: 1 addition & 0 deletions pkg/controller_helper/backendruntime.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ func (p *BackendRuntimeParser) Envs() []corev1.EnvVar {
}

func (p *BackendRuntimeParser) Args(mode InferenceMode, models []*coreapi.OpenModel) ([]string, error) {
// TODO: add validation in webhook.
if mode == SpeculativeDecodingInferenceMode && len(models) != 2 {
return nil, fmt.Errorf("models number not right, want 2, got %d", len(models))
}
Expand Down
2 changes: 1 addition & 1 deletion test/e2e/suit_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ func readyForTesting(client client.Client) {
}, timeout, interval).Should(Succeed())

// Delete this model before beginning tests.
Expect(client.Delete(ctx, model))
Expect(client.Delete(ctx, model)).To(Succeed())
Eventually(func() error {
return client.Get(ctx, types.NamespacedName{Name: model.Name}, &coreapi.OpenModel{})
}).ShouldNot(Succeed())
Expand Down

0 comments on commit 2e7d44d

Please sign in to comment.