Fix docker GPU build for gptqmodel (#3018)

githubnemo · nemo · web-flow · commit 2cd96ed04162 · 2026-02-03T09:18:11.000+01:00
* Fix docker GPU build for gptqmodel gptqmodel requires information about the compute capability of the system. The default is to look at the output of `nvidia-smi` but since there is no compute hardware on the docker image builder instance we have to hard-code the compute capability. Since our CI runners use NVIDIA L4 which have a compute capability of 8.9 (according to https://developer.nvidia.com/cuda/gpus) we're using that. In the future it might be worth extending this so that people using this docker image are using a gptqmodel version that supports higher compute cap. as well. * Fix legacy format warnings in Dockerfile --------- Co-authored-by: nemo <git@ningu.net>
diff --git a/docker/peft-gpu/Dockerfile b/docker/peft-gpu/Dockerfile
@@ -20,15 +20,15 @@ RUN conda create --name peft python=${PYTHON_VERSION} ipython jupyter pip
 # Below is copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
 # We don't install pytorch here yet since CUDA isn't available
 # instead we use the direct torch wheel
-ENV PATH /opt/conda/envs/peft/bin:$PATH
+ENV PATH=/opt/conda/envs/peft/bin:$PATH
 # Activate our bash shell
 RUN chsh -s /bin/bash
 SHELL ["/bin/bash", "-c"]
 
 # Stage 2
 FROM nvidia/cuda:12.8.1-devel-ubuntu22.04 AS build-image
 COPY --from=compile-image /opt/conda /opt/conda
-ENV PATH /opt/conda/bin:$PATH
+ENV PATH=/opt/conda/bin:$PATH
 
 # Install apt libs
 RUN apt-get update && \
@@ -42,7 +42,13 @@ SHELL ["/bin/bash", "-c"]
 RUN conda run -n peft pip install --no-cache-dir bitsandbytes optimum
 
 # GPTQmodel doesn't find torch without build isolation
-RUN conda run -n peft pip install --no-build-isolation gptqmodel
+#
+# Note: we are hard-coding CUDA_ARCH_LIST here since `gptqmodel` requires either nvidia-smi
+# or CUDA_ARCH_LIST for compute capability information. Since the docker build is unlikely
+# to have compute hardware available we use the information from the CI runner (which hosts
+# a NVIDIA L4). So we fix the compute capability to 8.9. In the future we might extend this
+# to a list of compute capabilities (separated by ;).
+RUN CUDA_ARCH_LIST=8.9 conda run -n peft pip install --no-build-isolation gptqmodel
 
 RUN \
     # Add eetq for quantization testing; needs to run without build isolation since the setup