Skip to content

Commit 2cd96ed

Browse files
githubnemonemo
andauthored
Fix docker GPU build for gptqmodel (#3018)
* Fix docker GPU build for gptqmodel gptqmodel requires information about the compute capability of the system. The default is to look at the output of `nvidia-smi` but since there is no compute hardware on the docker image builder instance we have to hard-code the compute capability. Since our CI runners use NVIDIA L4 which have a compute capability of 8.9 (according to https://developer.nvidia.com/cuda/gpus) we're using that. In the future it might be worth extending this so that people using this docker image are using a gptqmodel version that supports higher compute cap. as well. * Fix legacy format warnings in Dockerfile --------- Co-authored-by: nemo <[email protected]>
1 parent c2b2867 commit 2cd96ed

File tree

1 file changed

+9
-3
lines changed

1 file changed

+9
-3
lines changed

docker/peft-gpu/Dockerfile

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,15 @@ RUN conda create --name peft python=${PYTHON_VERSION} ipython jupyter pip
2020
# Below is copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
2121
# We don't install pytorch here yet since CUDA isn't available
2222
# instead we use the direct torch wheel
23-
ENV PATH /opt/conda/envs/peft/bin:$PATH
23+
ENV PATH=/opt/conda/envs/peft/bin:$PATH
2424
# Activate our bash shell
2525
RUN chsh -s /bin/bash
2626
SHELL ["/bin/bash", "-c"]
2727

2828
# Stage 2
2929
FROM nvidia/cuda:12.8.1-devel-ubuntu22.04 AS build-image
3030
COPY --from=compile-image /opt/conda /opt/conda
31-
ENV PATH /opt/conda/bin:$PATH
31+
ENV PATH=/opt/conda/bin:$PATH
3232

3333
# Install apt libs
3434
RUN apt-get update && \
@@ -42,7 +42,13 @@ SHELL ["/bin/bash", "-c"]
4242
RUN conda run -n peft pip install --no-cache-dir bitsandbytes optimum
4343

4444
# GPTQmodel doesn't find torch without build isolation
45-
RUN conda run -n peft pip install --no-build-isolation gptqmodel
45+
#
46+
# Note: we are hard-coding CUDA_ARCH_LIST here since `gptqmodel` requires either nvidia-smi
47+
# or CUDA_ARCH_LIST for compute capability information. Since the docker build is unlikely
48+
# to have compute hardware available we use the information from the CI runner (which hosts
49+
# a NVIDIA L4). So we fix the compute capability to 8.9. In the future we might extend this
50+
# to a list of compute capabilities (separated by ;).
51+
RUN CUDA_ARCH_LIST=8.9 conda run -n peft pip install --no-build-isolation gptqmodel
4652

4753
RUN \
4854
# Add eetq for quantization testing; needs to run without build isolation since the setup

0 commit comments

Comments
 (0)