-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add first-pass at stability tritonserver-based imagegen comp #227
base: main
Are you sure you want to change the base?
Changes from all commits
de78a3d
78b6b26
16f32a8
195f700
0cf5965
4ea1511
acfeb8e
45ac958
6f238cd
d111033
2f47d34
6bae349
58d14ca
268d182
2c78a6a
89ffd51
7cfda4d
0670bec
e4634ec
da80ad5
221fae3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
FROM python:3.11-slim | ||
|
||
ENV LANG C.UTF-8 | ||
|
||
COPY comps /home/comps | ||
|
||
RUN pip install --no-cache-dir --upgrade pip && \ | ||
pip install --no-cache-dir -r /home/comps/imagegen/requirements.txt | ||
|
||
ENV PYTHONPATH=$PYTHONPATH:/home | ||
|
||
WORKDIR /home/comps/imagegen | ||
|
||
ENTRYPOINT ["python", "imagegen.py"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# ImageGen Microservice | ||
|
||
The ImageGen microservice is a solution for generating images using text input. The ImageGen solution service takes a triton endpoint that serves the actual text-to-image model, and in turn this service provides a solution endpoint consumable by users. | ||
|
||
# 1. Instructions to launch this solution | ||
|
||
This solution requires 1 backing container to operate - a triton-based inference server for executing the diffusion model. We will walk through how to deploy both images below. | ||
|
||
## 2.1 Build Model Server Docker Image | ||
|
||
```cd triton && make build | ||
|
||
``` | ||
|
||
## 2.2 Build Solution Server Docker Image | ||
|
||
```docker build -t opea/image-gen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . | ||
|
||
``` | ||
|
||
## 2.3 Run Docker with CLI | ||
|
||
```bash | ||
docker run -p 18000:8000 -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACE_API_TOKEN=${HUGGINGFACE_API_TOKEN} -e HABANA_VISIBLE_DEVICES=0 -v /opt/intel/huggingface/hub:/root/.cache/huggingface/hub ohio-image-triton:latest | ||
docker run -p 9765:9765 -e IMAGE_GEN_TRITON_ENDPOINT=http://localhost:18000 opea/image-gen:latest | ||
``` | ||
|
||
# 3. Consume Solution Service | ||
|
||
You can use the following `curl` command to test whether the service is up. Notice that the first request can be slow because it needs to download the models. | ||
|
||
```bash | ||
curl http://localhost:9765/v1/images/generation \ | ||
-H "Content-Type: application/json" \ | ||
-d '{"text":"A cat holding a fish skeleton"}' | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
import base64 | ||
import os | ||
import sys | ||
import time | ||
from io import BytesIO | ||
|
||
import numpy as np | ||
import tritonclient.http as httpclient | ||
from PIL import Image | ||
from tritonclient.utils import * | ||
|
||
from comps import Base64ByteStrDoc, ServiceType, TextDoc, opea_microservices, opea_telemetry, register_microservice | ||
|
||
|
||
@opea_telemetry | ||
def generate_image(*, text, triton_endpoint): | ||
start = time.time() | ||
|
||
network_timeout = 1000 * 300 | ||
with httpclient.InferenceServerClient(triton_endpoint, network_timeout=network_timeout) as client: | ||
queries = [text] | ||
input_arr = [np.frombuffer(bytes(q, "utf8"), dtype=np.uint8) for q in queries] | ||
max_size = max([a.size for a in input_arr]) | ||
input_arr = [np.pad(a, (0, max_size - a.size)) for a in input_arr] | ||
input_arr = np.stack(input_arr) | ||
|
||
inputs = [httpclient.InferInput("INPUT0", input_arr.shape, "UINT8")] | ||
inputs[0].set_data_from_numpy(input_arr, binary_data=True) | ||
|
||
outputs = [ | ||
httpclient.InferRequestedOutput("OUTPUT0"), | ||
] | ||
|
||
## TODO acwrenn | ||
## Parameterize for other ImageGen models? | ||
model_name = "stability" | ||
response = client.infer( | ||
model_name, | ||
inputs, | ||
request_id=str(1), | ||
outputs=outputs, | ||
timeout=network_timeout, | ||
) | ||
|
||
result = response.get_response() | ||
|
||
output0_data = response.as_numpy("OUTPUT0") | ||
if len(output0_data) == 0: | ||
raise Exception("error fetching images from triton server") | ||
print(f"generated image in {time.time() - start} seconds") | ||
return output0_data[0].asbytes() | ||
|
||
|
||
@register_microservice( | ||
name="opea_service@imagegen", | ||
service_type=ServiceType.IMAGEGEN, | ||
endpoint="/v1/images/generate", | ||
host="0.0.0.0", | ||
port=9765, | ||
input_datatype=TextDoc, | ||
output_datatype=Base64ByteStrDoc, | ||
) | ||
@opea_telemetry | ||
async def generate_image(input: TextDoc): | ||
triton_endpoint = os.getenv("IMAGE_GEN_TRITON_ENDPOINT", "http://localhost:8080") | ||
text = input.text | ||
image = generate_image(text=text, triton_endpoint=triton_endpoint) | ||
buffered = BytesIO() | ||
buffered.write(image.tobytes()) | ||
return Base64ByteStrDoc(byte_str=base64.b64encode(buffered.getvalue())) | ||
|
||
|
||
if __name__ == "__main__": | ||
print("[imagegen - router] ImageGen initialized.") | ||
opea_microservices["opea_service@imagegen"].start() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
#!/bin/bash | ||
|
||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
set -eu | ||
|
||
default_port=8080 | ||
default_card_num=0 | ||
default_model_cache_directory="${HOME}/.cache/huggingface/hub" | ||
HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} | ||
|
||
docker_cmd=<<EOF | ||
docker run -d \ | ||
--name=TritonStabilityServer -p ${default_port}:8000 \ | ||
-e HABANA_VISIBLE_DEVICES=${default_card_num} \ | ||
-e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} \ | ||
--cap-add=sys_nic \ | ||
--ipc=host \ | ||
--runtime=habana \ | ||
-v ${default_model_cache_directory}:/root/.cache/huggingface/hub \ | ||
ohio-stability-triton | ||
EOF | ||
|
||
eval $docker_cmd |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
docarray[full] | ||
fastapi | ||
numpy | ||
opentelemetry-api | ||
opentelemetry-exporter-otlp | ||
opentelemetry-sdk | ||
pillow | ||
sentencepiece | ||
shortuuid | ||
torch | ||
tritonclient[http] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
ARG TRITON_VERSION=24.04 | ||
|
||
FROM nvcr.io/nvidia/tritonserver:${TRITON_VERSION}-py3 AS triton | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to check on this container. |
||
|
||
FROM base | ||
|
||
ARG MODEL_NAME=stability | ||
|
||
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to check this one too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can take a stab and building the triton containers from source - but I am not sure where that code would live. Probably not in this repo. And who would host it? The Nvidia-distributed triton server container DOES contain a bunch of extra stuff we don't need. |
||
dpkg -i cuda-keyring_1.1-1_all.deb && \ | ||
apt-get update && \ | ||
apt-get install -y --fix-missing --no-install-recommends \ | ||
datacenter-gpu-manager | ||
|
||
RUN apt-get update && \ | ||
apt-get install -y --no-install-recommeds --fix-missing \ | ||
build-essential \ | ||
libaio-dev \ | ||
libaio1 \ | ||
libb64-0d \ | ||
libcupti-dev \ | ||
libjpeg-dev \ | ||
libpng-dev \ | ||
libsndfile-dev \ | ||
libwebp-dev | ||
|
||
ARG TRITON_VERSION=24.04 | ||
|
||
COPY --from=triton /opt/tritonserver /opt/tritonserver | ||
COPY --from=triton /usr/local/cuda-* /usr/local/cuda | ||
|
||
ENV PATH=$PATH:/opt/tritonserver/bin | ||
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/tritonserver/lib | ||
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/targets/x86_64-linux/lib | ||
|
||
RUN git clone --single-branch -b r${TRITON_VERSION} https://github.com/triton-inference-server/python_backend /opt/tritonserver/backends/opea_backends && \ | ||
mkdir -p /opt/tritonserver/backends/opea_backends/models/${MODEL_NAME}/1 | ||
|
||
COPY ./model.py /opt/tritonserver/backends/opea_backends/models/${MODEL_NAME}/1/model.py | ||
COPY ./config.pbtxt /opt/tritonserver/backends/opea_backends/models/${MODEL_NAME}/config.pbtxt | ||
|
||
CMD ["tritonserver", "--model-repository", "/opt/tritonserver/backends/opea_backends/models"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
FROM ubuntu:jammy | ||
ARG ARTIFACTORY_URL=vault.habana.ai | ||
ARG VERSION=1.15.1 | ||
ARG REVISION=15 | ||
|
||
ENV DEBIAN_FRONTEND=noninteractive | ||
ENV GC_KERNEL_PATH=/usr/lib/habanalabs/libtpc_kernels.so | ||
ENV HABANA_LOGS=/var/log/habana_logs/ | ||
ENV OS_NUMBER=2204 | ||
ENV HABANA_SCAL_BIN_PATH=/opt/habanalabs/engines_fw | ||
ENV HABANA_PLUGINS_LIB_PATH=/opt/habanalabs/habana_plugins | ||
|
||
RUN apt-get update && \ | ||
apt-get install -y --no-install-recommends \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need |
||
apt-transport-https \ | ||
apt-utils \ | ||
bc \ | ||
build-essential \ | ||
ca-certificates \ | ||
dkms \ | ||
ethtool \ | ||
gcc \ | ||
git \ | ||
gnupg \ | ||
gpg-agent \ | ||
graphviz \ | ||
libgl1 \ | ||
libgoogle-glog0v5 \ | ||
libjemalloc2 \ | ||
libpq-dev \ | ||
locales \ | ||
lsof \ | ||
make \ | ||
openssh-client \ | ||
openssh-server \ | ||
protobuf-compiler \ | ||
python3 \ | ||
python3-dev \ | ||
python3-pip \ | ||
python3-tk \ | ||
python3-venv \ | ||
unzip \ | ||
vim \ | ||
libkrb5-3 \ | ||
libgnutls30 \ | ||
wget && \ | ||
Comment on lines
+18
to
+49
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd make sure every single one of these packages are absolutely required before adding them to the container. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah - I think your other comment is valid. I will look and see if I can replace the Habana base image with something managed by a service/owner outside of this repo. Our OHIO team should probably provide us a "habana-base" image for building model servers on top of. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I am going to ignore your comments on the inner triton server dockerfiles for now, and look into outsourcing this code and building on that instead. |
||
apt-get autoremove && apt-get clean | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
RUN locale-gen en_US.UTF-8 | ||
|
||
ENV LANG=en_US.UTF-8 | ||
ENV LANGUAGE=en_US.UTF-8 | ||
ENV LC_ALL=en_US.UTF-8 | ||
ENV LC_CTYPE=en_US.UTF-8 | ||
|
||
# There is no need to store pip installation files inside docker image | ||
ENV PIP_NO_CACHE_DIR=on | ||
ENV PIP_DISABLE_PIP_VERSION_CHECK=1 | ||
|
||
RUN python3 -m pip install pip==23.3.1 setuptools==67.3.3 wheel==0.38.4 | ||
|
||
COPY install_efa.sh . | ||
RUN ./install_efa.sh && rm install_efa.sh && rm -rf /etc/ld.so.conf.d/efa.conf /etc/profile.d/efa.sh | ||
|
||
ENV LIBFABRIC_VERSION="1.20.0" | ||
ENV LIBFABRIC_ROOT="/opt/habanalabs/libfabric-${LIBFABRIC_VERSION}" | ||
ENV MPI_ROOT=/opt/amazon/openmpi | ||
ENV LD_LIBRARY_PATH=$LIBFABRIC_ROOT/lib:${MPI_ROOT}/lib:/usr/lib/habanalabs:$LD_LIBRARY_PATH | ||
ENV PATH=${LIBFABRIC_ROOT}/bin:${MPI_ROOT}/bin:$PATH | ||
ENV OPAL_PREFIX=${MPI_ROOT} | ||
ENV MPICC=${MPI_ROOT}/bin/mpicc | ||
ENV RDMAV_FORK_SAFE=1 | ||
ENV FI_EFA_USE_DEVICE_RDMA=1 | ||
ENV RDMA_CORE_ROOT=/opt/habanalabs/rdma-core/src | ||
ENV RDMA_CORE_LIB=${RDMA_CORE_ROOT}/build/lib | ||
|
||
RUN wget -O- https://${ARTIFACTORY_URL}/artifactory/api/gpg/key/public | gpg --dearmor -o /usr/share/keyrings/habana-artifactory.gpg && \ | ||
chown root:root /usr/share/keyrings/habana-artifactory.gpg && \ | ||
chmod 644 /usr/share/keyrings/habana-artifactory.gpg && \ | ||
echo "deb [signed-by=/usr/share/keyrings/habana-artifactory.gpg] https://${ARTIFACTORY_URL}/artifactory/debian jammy main" | tee -a /etc/apt/sources.list && \ | ||
apt-get update && \ | ||
apt-get install -y habanalabs-rdma-core="$VERSION"-"$REVISION" \ | ||
habanalabs-thunk="$VERSION"-"$REVISION" \ | ||
habanalabs-firmware-tools="$VERSION"-"$REVISION" \ | ||
habanalabs-graph="$VERSION"-"$REVISION" && \ | ||
apt-get autoremove --yes && apt-get clean && rm -rf /var/lib/apt/lists/* && \ | ||
sed --in-place "/$ARTIFACTORY_URL/d" /etc/apt/sources.list | ||
|
||
RUN wget -nv -O /tmp/libfabric-${LIBFABRIC_VERSION}.tar.bz2 https://github.com/ofiwg/libfabric/releases/download/v${LIBFABRIC_VERSION}/libfabric-${LIBFABRIC_VERSION}.tar.bz2 && \ | ||
cd /tmp/ && tar xf /tmp/libfabric-${LIBFABRIC_VERSION}.tar.bz2 && \ | ||
cd /tmp/libfabric-${LIBFABRIC_VERSION} && \ | ||
./configure --prefix=$LIBFABRIC_ROOT --enable-psm3-verbs --enable-verbs=yes --with-synapseai=/usr && \ | ||
make && make install | ||
|
||
RUN wget -nv -O /tmp/main.zip https://github.com/HabanaAI/hccl_ofi_wrapper/archive/refs/heads/main.zip && \ | ||
unzip /tmp/main.zip -d /tmp && \ | ||
cd /tmp/hccl_ofi_wrapper-main && \ | ||
make && cp -f libhccl_ofi_wrapper.so /usr/lib/habanalabs/libhccl_ofi_wrapper.so && \ | ||
cd / && \ | ||
rm -rf /tmp/main.zip /tmp/hccl_ofi_wrapper-main | ||
|
||
RUN python3 -m pip install habana_media_loader=="${VERSION}"."${REVISION}" | ||
|
||
# SSH configuration necessary to support mpi-operator v2 | ||
RUN mkdir -p /var/run/sshd && \ | ||
sed -i 's/[ #]\(.*StrictHostKeyChecking \).*/ \1no/g' /etc/ssh/ssh_config && \ | ||
sed -i 's/#\(ForwardAgent \).*/\1yes/g' /etc/ssh/ssh_config && \ | ||
echo " UserKnownHostsFile /dev/null" >> /etc/ssh/ssh_config && \ | ||
sed -i 's/#\(StrictModes \).*/\1no/g' /etc/ssh/sshd_config && \ | ||
echo "/etc/init.d/ssh start \"-p 3022\"" >> ~/.bashrc && \ | ||
sed -i '/[ -z "$PS1" ] && return/s/^/#/g' ~/.bashrc | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any packages that we installed and it was needed during compilation but not for run time, they need to be removed now or we need to make this a multi-stage container. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Copyright (c) 2023 HabanaLabs, Ltd0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this Dockerfile borrowed from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me check with the engineer that helped me with this base layer - I think I remember something about them only shipping the dockerfile, and not distributing the built image... |
||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
ARG BASE_NAME=base-installer-ubuntu22.04 | ||
ARG VERSION=1.15.1 | ||
ARG REVISION=15 | ||
|
||
FROM ${BASE_NAME}:${VERSION}-${REVISION} | ||
|
||
ARG BASE_NAME=base-installer-ubuntu22.04 | ||
ARG VERSION=1.15.1 | ||
ARG REVISION=15 | ||
ARG ARTIFACTORY_URL=vault.habana.ai | ||
ARG PT_VERSION=2.2.0 | ||
|
||
ENV LANG=en_US.UTF-8 | ||
ENV PYTHONPATH=/root:/usr/lib/habanalabs/ | ||
|
||
RUN apt-get update && apt-get install -y \ | ||
curl \ | ||
libcurl4 \ | ||
moreutils \ | ||
iproute2 \ | ||
libcairo2-dev \ | ||
libglib2.0-dev \ | ||
libhdf5-dev \ | ||
libselinux1-dev \ | ||
libnuma-dev \ | ||
libpcre2-dev \ | ||
libjpeg-dev \ | ||
liblapack-dev \ | ||
libopenblas-dev \ | ||
numactl \ | ||
pdsh \ | ||
libmkl-dev \ | ||
libgoogle-perftools-dev && \ | ||
apt-get clean && rm -rf /var/lib/apt/lists/* | ||
|
||
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1 | ||
|
||
RUN echo $BASE_NAME | ||
COPY install_packages.sh . | ||
|
||
RUN ./install_packages.sh && rm -f install_packages.sh && \ | ||
/sbin/ldconfig && echo "source /etc/profile.d/habanalabs.sh" >> ~/.bashrc | ||
|
||
ENV LD_PRELOAD=/lib/x86_64-linux-gnu/libtcmalloc.so.4 | ||
ENV TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD=7516192768 | ||
|
||
RUN rm -rf /tmp/* | ||
|
||
RUN apt-get update && pip3 install optimum[habana] opencv-python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, I'd set this to
localhost or 127.0.0.1
and then provide a mechanism for the user to decide if they want to run locally or listen to the world.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lifted this from the TTS comp - I guess its a common question of "do it the same way as the codebase, or do what makes sense locally"
Should I stick to "do it locally correct" in this case?