Skip to content

Commit

Permalink
Vllm and vllm-ray bug fix (add opea for vllm, update setuptools versi…
Browse files Browse the repository at this point in the history
…on) (#437)

* add opea/ for vllm and vllm-ray docker

Signed-off-by: Xinyao Wang <[email protected]>

* modify setuptools version

Signed-off-by: Xinyao Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix ut

Signed-off-by: Xinyao Wang <[email protected]>

* refine readme

Signed-off-by: Xinyao Wang <[email protected]>

---------

Signed-off-by: Xinyao Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
XinyaoWa and pre-commit-ci[bot] authored Aug 10, 2024
1 parent 8f0f2b0 commit 0614fc2
Show file tree
Hide file tree
Showing 12 changed files with 21 additions and 15 deletions.
2 changes: 1 addition & 1 deletion comps/llms/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ docker run -p 8008:80 -v ./data:/data --name tgi_service --shm-size 1g ghcr.io/h

```bash
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
docker run -it --name vllm_service -p 8008:80 -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -v ./data:/data vllm:cpu /bin/bash -c "cd / && export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --model ${your_hf_llm_model} --port 80"
docker run -it --name vllm_service -p 8008:80 -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -v ./data:/data opea/vllm:cpu /bin/bash -c "cd / && export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --model ${your_hf_llm_model} --port 80"
```

## 1.2.3 Start Ray Service
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cd ../../../../

docker build \
-f comps/llms/text-generation/vllm-ray/docker/Dockerfile.vllmray \
-t vllm_ray:habana \
-t opea/vllm_ray:habana \
--network=host \
--build-arg http_proxy=${http_proxy} \
--build-arg https_proxy=${https_proxy} \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ version: "3.8"

services:
vllm-ray-service:
image: vllm_ray:habana
image: opea/vllm_ray:habana
container_name: vllm-ray-gaudi-server
ports:
- "8006:8000"
Expand Down
2 changes: 1 addition & 1 deletion comps/llms/text-generation/vllm-ray/launch_vllmray.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,5 +39,5 @@ docker run -d --rm \
-e HTTPS_PROXY=$https_proxy \
-e HTTP_PROXY=$https_proxy \
-e HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN \
vllm_ray:habana \
opea/vllm_ray:habana \
/bin/bash -c "ray start --head && python vllm_ray_openai.py --port_number 8000 --model_id_or_path $model_name --tensor_parallel_size $parallel_number --enforce_eager $enforce_eager"
2 changes: 1 addition & 1 deletion comps/llms/text-generation/vllm-ray/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ opentelemetry-exporter-otlp
opentelemetry-sdk
prometheus-fastapi-instrumentator
ray[serve]>=2.10
setuptools==69.5.1
setuptools
shortuuid
transformers
uvicorn
Expand Down
6 changes: 6 additions & 0 deletions comps/llms/text-generation/vllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@ bash ./build_docker_vllm.sh hpu

Set `hw_mode` to `hpu`.

Note: If you want to enable tensor parallel, please set `setuptools==69.5.1` in Dockerfile.hpu before build docker with following command.

```
sed -i "s/RUN pip install setuptools/RUN pip install setuptools==69.5.1/g" docker/Dockerfile.hpu
```

#### Launch vLLM service on single node

For small model, we can just use single node.
Expand Down
4 changes: 2 additions & 2 deletions comps/llms/text-generation/vllm/build_docker_vllm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ fi

# Build the docker image for vLLM based on the hardware mode
if [ "$hw_mode" = "hpu" ]; then
docker build -f docker/Dockerfile.hpu -t vllm:hpu --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
docker build -f docker/Dockerfile.hpu -t opea/vllm:hpu --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
else
git clone https://github.com/vllm-project/vllm.git
cd ./vllm/
docker build -f Dockerfile.cpu -t vllm:cpu --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
docker build -f Dockerfile.cpu -t opea/vllm:cpu --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
fi
2 changes: 1 addition & 1 deletion comps/llms/text-generation/vllm/docker/Dockerfile.hpu
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ RUN pip install --upgrade-strategy eager optimum[habana]

RUN pip install -v git+https://github.com/HabanaAI/vllm-fork.git@cf6952d

RUN pip install setuptools==69.5.1
RUN pip install setuptools

RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
service ssh restart
Expand Down
2 changes: 1 addition & 1 deletion comps/llms/text-generation/vllm/docker_compose_llm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ version: "3.8"

services:
vllm-service:
image: vllm:hpu
image: opea/vllm:hpu
container_name: vllm-gaudi-server
ports:
- "8008:80"
Expand Down
4 changes: 2 additions & 2 deletions comps/llms/text-generation/vllm/launch_vllm_service.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ volume=$PWD/data

# Build the Docker run command based on hardware mode
if [ "$hw_mode" = "hpu" ]; then
docker run -d --rm --runtime=habana --name="vllm-service" -p $port_number:80 -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} vllm:hpu /bin/bash -c "export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --enforce-eager --model $model_name --tensor-parallel-size $parallel_number --host 0.0.0.0 --port 80 --block-size $block_size --max-num-seqs $max_num_seqs --max-seq_len-to-capture $max_seq_len_to_capture "
docker run -d --rm --runtime=habana --name="vllm-service" -p $port_number:80 -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/vllm:hpu /bin/bash -c "export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --enforce-eager --model $model_name --tensor-parallel-size $parallel_number --host 0.0.0.0 --port 80 --block-size $block_size --max-num-seqs $max_num_seqs --max-seq_len-to-capture $max_seq_len_to_capture "
else
docker run -d --rm --name="vllm-service" -p $port_number:80 --network=host -v $volume:/data -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e VLLM_CPU_KVCACHE_SPACE=40 vllm:cpu --model $model_name --host 0.0.0.0 --port 80
docker run -d --rm --name="vllm-service" -p $port_number:80 --network=host -v $volume:/data -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e VLLM_CPU_KVCACHE_SPACE=40 opea/vllm:cpu --model $model_name --host 0.0.0.0 --port 80
fi
4 changes: 2 additions & 2 deletions tests/test_llms_text-generation_vllm-ray.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ function build_docker_images() {
cd $WORKPATH
docker build \
-f comps/llms/text-generation/vllm-ray/docker/Dockerfile.vllmray \
-t vllm_ray:habana --network=host .
-t opea/vllm_ray:habana --network=host .

## Build OPEA microservice docker
cd $WORKPATH
Expand All @@ -34,7 +34,7 @@ function start_service() {
--ipc=host \
-e HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN \
-p $port_number:8000 \
vllm_ray:habana \
opea/vllm_ray:habana \
/bin/bash -c "ray start --head && python vllm_ray_openai.py --port_number 8000 --model_id_or_path $LLM_MODEL --tensor_parallel_size 2 --enforce_eager False"

export vLLM_RAY_ENDPOINT="http://${ip_address}:${port_number}"
Expand Down
4 changes: 2 additions & 2 deletions tests/test_llms_text-generation_vllm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ function build_docker_images() {
cd $WORKPATH/comps/llms/text-generation/vllm
docker build \
-f docker/Dockerfile.hpu \
-t vllm:hpu \
-t opea/vllm:hpu \
--shm-size=128g .

## Build OPEA microservice docker
Expand All @@ -35,7 +35,7 @@ function start_service() {
--cap-add=sys_nice \
--ipc=host \
-e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} \
vllm:hpu \
opea/vllm:hpu \
/bin/bash -c "export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --enforce-eager --model $LLM_MODEL --tensor-parallel-size 1 --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048"

export vLLM_ENDPOINT="http://${ip_address}:${port_number}"
Expand Down

0 comments on commit 0614fc2

Please sign in to comment.