Skip to content

Commit

Permalink
[pre-commit.ci] auto fixes from pre-commit.com hooks
Browse files Browse the repository at this point in the history
for more information, see https://pre-commit.ci
  • Loading branch information
pre-commit-ci[bot] committed Nov 8, 2024
1 parent f315a09 commit d6be0fd
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 6 deletions.
17 changes: 14 additions & 3 deletions comps/llms/text-generation/tgi/llama_stack/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,15 @@ export LLM_MODEL_ID="meta-llama/Llama-3.1-8B-Instruct" # change to your llama mo
export TGI_LLM_ENDPOINT="http://${your_ip}:8008"
export LLAMA_STACK_ENDPOINT="http://${your_ip}:5000"
```

Insert `TGI_LLM_ENDPOINT` to llama stack configuration yaml, you can use `envsubst` command, or just replace `${TGI_LLM_ENDPOINT}` with actual value manually.

```bash
envsubst < ./dependency/llama_stack_run_template.yaml > ./dependency/llama_stack_run.yaml
```

Make sure get a `llama_stack_run.yaml` file, in which the inference provider is pointing to the correct TGI server endpoint. E.g.

```bash
inference:
- provider_id: tgi0
Expand All @@ -40,9 +44,11 @@ pip install -r requirements.txt
```

### 2.2 Start TGI Service

First we start a TGI endpoint for your LLM model on Gaudi.

```bash
volume="./data"
volume="./data"
docker run -p 8008:80 \
--name tgi_service \
-v $volume:/data \
Expand All @@ -63,7 +69,9 @@ docker run -p 8008:80 \
```

### 2.3 Start Llama Stack Server

Then we start the Llama Stack server based on TGI endpoint.

```bash
docker run \
--name llamastack-service \
Expand All @@ -74,9 +82,11 @@ docker run \
```

### 2.4 Start Microservice with Python Script

```bash
python llm.py
```

## 🚀3. Start Microservice with Docker (Option 2)

If you start an LLM microservice with docker, the `docker_compose_llm.yaml` file will automatically start TGI and Llama Stack service with docker.
Expand Down Expand Up @@ -119,8 +129,8 @@ curl http://${your_ip}:9000/v1/health_check\

### 4.2 Consume the Services


Verify the TGI Service

```bash
curl http://${your_ip}:8008/generate \
-X POST \
Expand All @@ -129,6 +139,7 @@ curl http://${your_ip}:8008/generate \
```

Verify Llama Stack Service

```bash
curl http://${your_ip}:5000/inference/chat_completion \
-H "Content-Type: application/json" \
Expand Down Expand Up @@ -156,4 +167,4 @@ curl http://${your_ip}:9000/v1/chat/completions \
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```
```
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

version: '2'
built_at: '2024-10-08T17:40:45.325529'
image_name: local
Expand Down
3 changes: 2 additions & 1 deletion comps/llms/text-generation/tgi/llama_stack/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
logger = CustomLogger("llm_tgi_llama_stack")
logflag = os.getenv("LOGFLAG", False)


@register_microservice(
name="opea_service@llm_tgi_llama_stack",
service_type=ServiceType.LLM,
Expand Down Expand Up @@ -70,4 +71,4 @@ async def stream_generator():


if __name__ == "__main__":
opea_microservices["opea_service@llm_tgi_llama_stack"].start()
opea_microservices["opea_service@llm_tgi_llama_stack"].start()
4 changes: 2 additions & 2 deletions comps/llms/text-generation/tgi/llama_stack/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ docarray[full]
fastapi
httpx
huggingface_hub
llama-stack
llama-stack-client
opentelemetry-api
opentelemetry-exporter-otlp
opentelemetry-sdk
prometheus-fastapi-instrumentator
shortuuid
transformers
uvicorn
llama-stack-client
llama-stack

0 comments on commit d6be0fd

Please sign in to comment.