Skip to content

Commit c918561

Browse files
authored
Feat/trusted (#106)
* Create a `basic` Docker file for pure `conda` installs with official Docker images only. * Add catalog image. * Rename NGC to catalog. * Add IMAGE_FLAVOR variable and change order of training interactive images for consistency. * Fix indentation and add new variables. Remove old NGC service and replace it with the new catalog-based services. * Add placeholders for new service requirements files. * Fix Pyre issue where the project directory was unnecessarily included. * Add `hist` alias in Dockerfile. * Restructure the Docker Compose file to create new services for the new images. Also create a `base` stage containing common configurations. * Update PyInk version to get bugfix. * Add shellcheck to pre-commit. * Format project according to pre-commit. * Move common build arguments in the docker-compose.yaml file. * Change PyTorch Official Docker image service name to `hub` since the images come from Docker Hub. Change the name of the new service from `basic` to `simple`, which better captures the intention and is less confusing. * Simplify the `simple` service to provide any base image. * Fix incorrect BASE_IMAGE variable. * Update the requirements to get things to work. * Update the Dockerfile to fix build failures. * Fix major bug in the Cresset Dockerfile where conda configurations were being put in the wrong place. * Finish work on the NGC service. * Delete the `deploy` service. If necessary, use the `simple` service instead. The `deploy` stage was becoming too much of a burden and made it hard to read the Cresset Dockerfile. * Rename `catalog` to `ngc` as the NGC and Hub services were separated even in the Dockerfiles. * Add support for Python 3.8+. Update Ruff version. * Finish configuring the Hub image. * Add new requirements for new stages. * Get Python 3.8 compatibility. * Refactor tests for Python 3.8 compatibility. * Update services for new functionality. Also delete `deploy` service. * Add nodefaults to conda channel configurations. * Add necessary arguments for the `simple` service. * Clean up apt requirements. * Add new dependencies. The service still does not work. * Fixed `simple` to allow lockfiles and use conda environment files. * Fix hub image bug with conda channels. * Add cuda-toolkit as a dependency for `simple`. It still does not work. * Move `NVIDIA_VISIBLE_DEVICES` into the image for reproducibility outside of Compose. Remove unnecessary requirements in the simple-environment.yaml file.
1 parent 65bf29b commit c918561

18 files changed

+617
-308
lines changed

.pre-commit-config.yaml

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ repos:
3232

3333
# Ruff should be executed before other formatters.
3434
- repo: https://github.com/charliermarsh/ruff-pre-commit
35-
rev: "v0.0.260"
35+
rev: "v0.0.261"
3636
hooks:
3737
- id: ruff
3838
args: [--exit-non-zero-on-fix]
@@ -43,8 +43,14 @@ repos:
4343
- id: prettier
4444
files: \.(json|jsx|md|mdx|yaml)$
4545

46+
- repo: https://github.com/koalaman/shellcheck-precommit
47+
rev: v0.9.0
48+
hooks:
49+
- id: shellcheck
50+
# args: ["--severity=warning"] # Optionally only show errors and warnings
51+
4652
- repo: https://github.com/google/pyink
47-
rev: 23.3.0
53+
rev: 23.3.1
4854
hooks: # Using PyInk, the Google fork of Black, for Python code formatting.
4955
- id: pyink
5056

Dockerfile

Lines changed: 23 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ ARG INTERACTIVE_MODE
3636
ARG USE_CUDA=1
3737
ARG CUDA_VERSION
3838
ARG CUDNN_VERSION
39+
ARG IMAGE_FLAVOR
3940
ARG LINUX_DISTRO
4041
ARG DISTRO_VERSION
4142
ARG TORCH_CUDA_ARCH_LIST
@@ -46,9 +47,9 @@ ARG GIT_IMAGE=alpine/git:edge-2.38.1
4647
ARG CURL_IMAGE=curlimages/curl:latest
4748

4849
# Build-related packages are pre-installed on CUDA `devel` images.
50+
# The `TRAIN_IMAGE` will use the `devel` flaavor by default for convenience.
4951
ARG BUILD_IMAGE=nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-${LINUX_DISTRO}${DISTRO_VERSION}
50-
ARG TRAIN_IMAGE=nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-${LINUX_DISTRO}${DISTRO_VERSION}
51-
ARG DEPLOY_IMAGE=nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-runtime-${LINUX_DISTRO}${DISTRO_VERSION}
52+
ARG TRAIN_IMAGE=nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-${IMAGE_FLAVOR}-${LINUX_DISTRO}${DISTRO_VERSION}
5253

5354
########################################################################
5455
FROM ${CURL_IMAGE} AS curl-conda
@@ -95,7 +96,7 @@ ARG PYTHON_VERSION
9596
# Clean out package directories and `__pycache__` files to save space.
9697
RUN --mount=type=bind,from=curl-conda,source=/tmp/conda,target=/tmp/conda \
9798
/bin/bash /tmp/conda/miniconda.sh -b -p /opt/conda && \
98-
printf "channels:\n - conda-forge\n" > /opt/conda.condarc && \
99+
printf "channels:\n - conda-forge\n - nodefaults\n" > /opt/conda/.condarc && \
99100
$conda install -y python=${PYTHON_VERSION} && \
100101
conda clean -ya --force-pkgs-dirs && \
101102
find /opt/conda -name '__pycache__' | xargs rm -rf
@@ -417,7 +418,7 @@ FROM ${TRAIN_IMAGE} AS train-base
417418
# Edit this section if necessary but use `docker-compose.yaml` if possible.
418419
# Common configurations performed before creating a user should be placed here.
419420

420-
421+
LABEL maintainer="[email protected]"
421422
ENV LANG=C.UTF-8
422423
ENV LC_ALL=C.UTF-8
423424
ENV PYTHONIOENCODING=UTF-8
@@ -437,9 +438,8 @@ RUN rm -f /etc/apt/apt.conf.d/docker-clean; \
437438

438439
# Using `sed` and `xargs` to imitate the behavior of a requirements file.
439440
# The `--mount=type=bind` temporarily mounts a directory from another stage.
440-
# See the `deploy` stage below to see how to add other apt reporitories.
441441
# `apt` requirements are copied from the outside instead of from
442-
# `train-builds` to allow parallel installation with pip.
442+
# `train-builds` to allow parallel installation with `conda`.
443443
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
444444
--mount=type=cache,target=/var/lib/apt,sharing=locked \
445445
--mount=type=bind,from=train-stash,source=/tmp/apt,target=/tmp/apt \
@@ -448,20 +448,6 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
448448
xargs apt-get install -y --no-install-recommends && \
449449
rm -rf /var/lib/apt/lists/*
450450

451-
########################################################################
452-
FROM train-base AS train-interactive-exclude
453-
# This stage exists to create images for use in Kubernetes clusters or for
454-
# uploading image to a container registry, where interactive configurations
455-
# are unnecessary and having the user set to `root` is most convenient.
456-
# Singularity users may also find this stage convenient.
457-
# It is designed to be as close to the interactive development environment as
458-
# possible, with the same `apt`, `conda`, and `pip` packages installed.
459-
# Most users may safely ignore this stage except when publishing an image
460-
# to a container repository for reproducibility.
461-
462-
COPY --link --from=train-builds /opt/conda /opt/conda
463-
RUN echo /opt/conda/lib >> /etc/ld.so.conf.d/conda.conf && ldconfig
464-
465451
########################################################################
466452
FROM train-base AS train-interactive-include
467453
# This stage exists to create an interactive development environment with ease
@@ -485,6 +471,7 @@ RUN groupadd -f -g ${GID} ${GRP} && \
485471

486472
# Get conda with the directory ownership given to the user.
487473
COPY --link --from=train-builds --chown=${UID}:${GID} /opt/conda /opt/conda
474+
# The `ldconfig` command is necessary for PyTorch to find MKL and other libraries.
488475
RUN echo /opt/conda/lib >> /etc/ld.so.conf.d/conda.conf && ldconfig
489476

490477
USER ${USR}
@@ -516,10 +503,26 @@ RUN echo "source ${ZSHS_PATH}/zsh-syntax-highlighting.zsh" >> ${HOME}/.zshrc
516503
# Add `ll` alias for convenience. The Mac version of `ll` is used
517504
# instead of the Ubuntu version due to better configurability.
518505
# Add `wns` as an alias for `watch nvidia-smi`, which is used often.
506+
# Add `hist` as a shortcut to see the full history in `zsh`.
519507
RUN { echo "alias ll='ls -lh'"; \
520508
echo "alias wns='watch nvidia-smi'"; \
509+
echo "alias hist='history 1'"; \
521510
} >> ${HOME}/.zshrc
522511

512+
########################################################################
513+
FROM train-base AS train-interactive-exclude
514+
# This stage exists to create images for use in Kubernetes clusters or for
515+
# uploading images to a container registry, where interactive configurations
516+
# are unnecessary and having the user set to `root` is most convenient.
517+
# Singularity users may also find this stage useful.
518+
# It is designed to be as close to the interactive development environment as
519+
# possible, with the same `apt`, `conda`, and `pip` packages installed.
520+
# Most users may safely ignore this stage except when publishing an image
521+
# to a container repository for reproducibility.
522+
523+
COPY --link --from=train-builds /opt/conda /opt/conda
524+
RUN echo /opt/conda/lib >> /etc/ld.so.conf.d/conda.conf && ldconfig
525+
523526
########################################################################
524527
FROM train-interactive-${INTERACTIVE_MODE} AS train
525528
# Common configurations performed after `/opt/conda` installation
@@ -556,93 +559,3 @@ ENV PYTHONPATH=${PROJECT_ROOT}
556559
WORKDIR ${PROJECT_ROOT}
557560

558561
CMD ["/bin/zsh"]
559-
560-
########################################################################
561-
FROM ${BUILD_IMAGE} AS deploy-builds-include
562-
563-
COPY --link --from=build-pillow /tmp/dist /tmp/dist
564-
COPY --link --from=build-vision /tmp/dist /tmp/dist
565-
566-
########################################################################
567-
FROM ${BUILD_IMAGE} AS deploy-builds-exclude
568-
569-
COPY --link --from=build-pillow /tmp/dist /tmp/dist
570-
COPY --link --from=fetch-torch /tmp/dist /tmp/dist
571-
COPY --link --from=fetch-vision /tmp/dist /tmp/dist
572-
573-
########################################################################
574-
FROM deploy-builds-${BUILD_MODE} AS deploy-builds
575-
576-
# Minimalist deployment preparation layer.
577-
578-
# If any `pip` packages must be compiled on installation, create a wheel in the
579-
# `build` stages and move it to `/tmp/dist`. Otherwise, the installtion may fail.
580-
# See `Pillow-SIMD` in the TorchVision build process for an example.
581-
# The `deploy` image is a CUDA `runtime` image without compiler tools.
582-
583-
# The Anaconda defaults channel and Intel MKL are not fully open-source.
584-
# Enterprise users may therefore wish to remove them from their final product.
585-
# The deployment therefore uses system Python.
586-
# Intel packages such as MKL can be removed by using MKL_MODE=exclude during the build.
587-
# This may also be useful for non-Intel CPUs.
588-
589-
COPY --link reqs/apt-deploy.requirements.txt /tmp/apt/requirements.txt
590-
COPY --link reqs/pip-deploy.requirements.txt /tmp/pip/requirements.txt
591-
592-
# Use the Python interpreter from `conda` to create wheel files while
593-
# the compilers and other build tools are still available.
594-
RUN --mount=type=bind,from=install-conda,source=/opt/conda,target=/opt/conda \
595-
/opt/conda/bin/python -m pip wheel --wheel-dir /tmp/dist --find-links /tmp/dist \
596-
-r /tmp/pip/requirements.txt \
597-
/tmp/dist/*.whl
598-
599-
########################################################################
600-
# Minimalist deployment Ubuntu image.
601-
# Currently failing for PyTorch 2.x due to minor dependency issues.
602-
# If downloading the wheel, create a symbolic link to `libnvrtc.so`, then run `ldconfig`.
603-
# If building from source, include `libcupti-dev` in the `apt` requirements file.
604-
FROM ${DEPLOY_IMAGE} AS deploy
605-
606-
607-
ENV LANG=C.UTF-8
608-
ENV LC_ALL=C.UTF-8
609-
ENV PYTHONIOENCODING=UTF-8
610-
ENV PYTHONDONTWRITEBYTECODE=1
611-
ENV PYTHONUNBUFFERED=1
612-
613-
# Use mirror links optimized for user location and security level.
614-
ARG DEB_OLD
615-
ARG DEB_NEW
616-
617-
# Replace the `--mount=...` instructions with `COPY` if BuildKit is unavailable.
618-
# The `readwrite` option is necessary as `apt` needs write permissions on `/tmp`.
619-
# Both `python` and `python3` are set to point to the installed version of Python.
620-
# The pre-installed system Python3 may be overridden if the installed and pre-installed
621-
# versions of Python3 are the same (e.g., Python 3.8 on Ubuntu 20.04 LTS).
622-
# `printf` is preferred over `echo` when escape characters are used due to
623-
# the inconsistent behavior of `echo` across different shells.
624-
# `software-properties-common` is required for the `add-apt-repository` command.
625-
# Using `sed` and `xargs` to imitate the behavior of a requirements file.
626-
ARG PYTHON_VERSION
627-
ARG DEBIAN_FRONTEND=noninteractive
628-
RUN --mount=type=bind,from=deploy-builds,readwrite,source=/tmp/apt,target=/tmp/apt \
629-
if [ ${DEB_NEW} ]; then sed -i "s%${DEB_OLD}%${DEB_NEW}%g" /etc/apt/sources.list; fi && \
630-
apt-get update && apt-get install -y --no-install-recommends software-properties-common && \
631-
add-apt-repository ppa:deadsnakes/ppa && apt-get update && \
632-
printf "\n python${PYTHON_VERSION} \n" >> /tmp/apt/requirements.txt && \
633-
sed -e 's/#.*//g' -e 's/\r//g' /tmp/apt/requirements.txt | \
634-
xargs apt-get install -y --no-install-recommends && \
635-
rm -rf /var/lib/apt/lists/* && \
636-
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 1 && \
637-
update-alternatives --install /usr/bin/python python /usr/bin/python${PYTHON_VERSION} 1
638-
639-
# The `mkl` package must be installed for PyTorch to use MKL outside `conda`.
640-
# The MKL major version used at runtime must match the version used to build PyTorch.
641-
# The `ldconfig` command is necessary for PyTorch to find MKL and other libraries.
642-
# Installing all packages in one command allows `pip` to resolve dependencies correctly.
643-
# Using multiple `pip` installs may break the dependencies of all but the last installation.
644-
# No dependencies are included as they should all have been installed during the build.
645-
RUN --mount=type=bind,from=deploy-builds,source=/tmp/dist,target=/tmp/dist \
646-
python -m pip install --no-cache-dir --no-deps --find-links /tmp/dist \
647-
/tmp/dist/*.whl && \
648-
ldconfig

Makefile

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,12 +131,11 @@ install-compose: ${COMPOSE_FILE}
131131
pre-commit:
132132
pre-commit run --all-files
133133

134-
PYRE_CONFIGURATION = ${PROJECT_ROOT}/.pyre_configuration
134+
PYRE_CONFIGURATION = .pyre_configuration
135135
${PYRE_CONFIGURATION}:
136136
pyre init
137137

138138
# Perform static analysis on the codebase and
139139
# apply the annotations to the code in-place.
140-
# Run this command from inside the container, not from the host.
141140
pyre-apply: ${PYRE_CONFIGURATION}
142141
pyre infer -i

0 commit comments

Comments
 (0)