Skip to content

Commit 0614fc2

Browse files
Vllm and vllm-ray bug fix (add opea for vllm, update setuptools version) (#437)
* add opea/ for vllm and vllm-ray docker Signed-off-by: Xinyao Wang <[email protected]> * modify setuptools version Signed-off-by: Xinyao Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ut Signed-off-by: Xinyao Wang <[email protected]> * refine readme Signed-off-by: Xinyao Wang <[email protected]> --------- Signed-off-by: Xinyao Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 8f0f2b0 commit 0614fc2

12 files changed

+21
-15
lines changed

comps/llms/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ docker run -p 8008:80 -v ./data:/data --name tgi_service --shm-size 1g ghcr.io/h
3232

3333
```bash
3434
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
35-
docker run -it --name vllm_service -p 8008:80 -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -v ./data:/data vllm:cpu /bin/bash -c "cd / && export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --model ${your_hf_llm_model} --port 80"
35+
docker run -it --name vllm_service -p 8008:80 -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -v ./data:/data opea/vllm:cpu /bin/bash -c "cd / && export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --model ${your_hf_llm_model} --port 80"
3636
```
3737

3838
## 1.2.3 Start Ray Service

comps/llms/text-generation/vllm-ray/build_docker_vllmray.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ cd ../../../../
55

66
docker build \
77
-f comps/llms/text-generation/vllm-ray/docker/Dockerfile.vllmray \
8-
-t vllm_ray:habana \
8+
-t opea/vllm_ray:habana \
99
--network=host \
1010
--build-arg http_proxy=${http_proxy} \
1111
--build-arg https_proxy=${https_proxy} \

comps/llms/text-generation/vllm-ray/docker_compose_llm.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ version: "3.8"
55

66
services:
77
vllm-ray-service:
8-
image: vllm_ray:habana
8+
image: opea/vllm_ray:habana
99
container_name: vllm-ray-gaudi-server
1010
ports:
1111
- "8006:8000"

comps/llms/text-generation/vllm-ray/launch_vllmray.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,5 +39,5 @@ docker run -d --rm \
3939
-e HTTPS_PROXY=$https_proxy \
4040
-e HTTP_PROXY=$https_proxy \
4141
-e HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN \
42-
vllm_ray:habana \
42+
opea/vllm_ray:habana \
4343
/bin/bash -c "ray start --head && python vllm_ray_openai.py --port_number 8000 --model_id_or_path $model_name --tensor_parallel_size $parallel_number --enforce_eager $enforce_eager"

comps/llms/text-generation/vllm-ray/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ opentelemetry-exporter-otlp
1111
opentelemetry-sdk
1212
prometheus-fastapi-instrumentator
1313
ray[serve]>=2.10
14-
setuptools==69.5.1
14+
setuptools
1515
shortuuid
1616
transformers
1717
uvicorn

comps/llms/text-generation/vllm/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,12 @@ bash ./build_docker_vllm.sh hpu
5050

5151
Set `hw_mode` to `hpu`.
5252

53+
Note: If you want to enable tensor parallel, please set `setuptools==69.5.1` in Dockerfile.hpu before build docker with following command.
54+
55+
```
56+
sed -i "s/RUN pip install setuptools/RUN pip install setuptools==69.5.1/g" docker/Dockerfile.hpu
57+
```
58+
5359
#### Launch vLLM service on single node
5460

5561
For small model, we can just use single node.

comps/llms/text-generation/vllm/build_docker_vllm.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,9 @@ fi
3030

3131
# Build the docker image for vLLM based on the hardware mode
3232
if [ "$hw_mode" = "hpu" ]; then
33-
docker build -f docker/Dockerfile.hpu -t vllm:hpu --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
33+
docker build -f docker/Dockerfile.hpu -t opea/vllm:hpu --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
3434
else
3535
git clone https://github.com/vllm-project/vllm.git
3636
cd ./vllm/
37-
docker build -f Dockerfile.cpu -t vllm:cpu --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
37+
docker build -f Dockerfile.cpu -t opea/vllm:cpu --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
3838
fi

comps/llms/text-generation/vllm/docker/Dockerfile.hpu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ RUN pip install --upgrade-strategy eager optimum[habana]
99

1010
RUN pip install -v git+https://github.com/HabanaAI/vllm-fork.git@cf6952d
1111

12-
RUN pip install setuptools==69.5.1
12+
RUN pip install setuptools
1313

1414
RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
1515
service ssh restart

comps/llms/text-generation/vllm/docker_compose_llm.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ version: "3.8"
55

66
services:
77
vllm-service:
8-
image: vllm:hpu
8+
image: opea/vllm:hpu
99
container_name: vllm-gaudi-server
1010
ports:
1111
- "8008:80"

comps/llms/text-generation/vllm/launch_vllm_service.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ volume=$PWD/data
3838

3939
# Build the Docker run command based on hardware mode
4040
if [ "$hw_mode" = "hpu" ]; then
41-
docker run -d --rm --runtime=habana --name="vllm-service" -p $port_number:80 -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} vllm:hpu /bin/bash -c "export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --enforce-eager --model $model_name --tensor-parallel-size $parallel_number --host 0.0.0.0 --port 80 --block-size $block_size --max-num-seqs $max_num_seqs --max-seq_len-to-capture $max_seq_len_to_capture "
41+
docker run -d --rm --runtime=habana --name="vllm-service" -p $port_number:80 -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/vllm:hpu /bin/bash -c "export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --enforce-eager --model $model_name --tensor-parallel-size $parallel_number --host 0.0.0.0 --port 80 --block-size $block_size --max-num-seqs $max_num_seqs --max-seq_len-to-capture $max_seq_len_to_capture "
4242
else
43-
docker run -d --rm --name="vllm-service" -p $port_number:80 --network=host -v $volume:/data -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e VLLM_CPU_KVCACHE_SPACE=40 vllm:cpu --model $model_name --host 0.0.0.0 --port 80
43+
docker run -d --rm --name="vllm-service" -p $port_number:80 --network=host -v $volume:/data -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e VLLM_CPU_KVCACHE_SPACE=40 opea/vllm:cpu --model $model_name --host 0.0.0.0 --port 80
4444
fi

0 commit comments

Comments
 (0)