Skip to content

Commit febf0e9

Browse files
authored
Fixes many miscellaneous bugs. (#122)
* Remove the Intel channel from Python, Numpy etc. because issues only arise with `pip` installs, not `conda` installs. The Intel channel was causing unnecessary overhead and issues. * Make the rules for lockfiles more flexible. * Fix accidental CUDA version change to 11.7 back to 11.8. * Remove unnecessary `apt` requirements. * Remove HOST_NAME variable. * Cleanup of the docker-compose.yaml file to make services easier to see. Most settings have been moved to the `base` service. Also got rid of the $HOST_NAME variable. * Set the `SHELL` environment variable to an empty string as it was previously fixed to `/bin/bash`. This fixes the color problem in new `tmux` shells and prevents possible incompatibilities in Docker `RUN` instructions.
1 parent bf7b3c1 commit febf0e9

File tree

8 files changed

+71
-119
lines changed

8 files changed

+71
-119
lines changed

.dockerignore

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,5 @@
77
!**/*requirements*.txt
88
!*environment*.yaml
99
!**/*environment*.yaml
10-
!*conda-lock.yaml
11-
!**/*conda-lock.yaml
10+
!*conda-lock*.yaml
11+
!**/*conda-lock*.yaml

Makefile

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ _PROJECT = "${SERVICE}-${USR}"
2020
PROJECT = $(shell echo ${_PROJECT} | tr "[:upper:]" "[:lower:]")
2121
PROJECT_ROOT = /opt/project
2222

23-
# Creates a `.env` file in PWD if it does not exist.
23+
# Creates a `.env` file in ${PWD} if it does not exist.
2424
# This will help prevent UID/GID bugs in `docker-compose.yaml`,
2525
# which unfortunately cannot use shell outputs in the file.
2626
# Image names have the usernames appended to them to prevent
@@ -39,7 +39,6 @@ IMAGE_NAME = $(shell echo ${_IMAGE_NAME} | tr "[:upper:]" "[:lower:]")
3939

4040
# Makefiles require `$\` at the end of a line for multi-line string values.
4141
# https://www.gnu.org/software/make/manual/html_node/Splitting-Lines.html
42-
# `HOST_NAME` avoids conflict with the `HOSTNAME` shell builtin variable.
4342
ENV_TEXT = "$\
4443
GID=${GID}\n$\
4544
UID=${UID}\n$\
@@ -48,16 +47,13 @@ USR=${USR}\n$\
4847
PROJECT=${PROJECT}\n$\
4948
SERVICE=${SERVICE}\n$\
5049
COMMAND=${COMMAND}\n$\
51-
HOST_NAME=${SERVICE}\n$\
5250
IMAGE_NAME=${IMAGE_NAME}\n$\
5351
PROJECT_ROOT=${PROJECT_ROOT}\n$\
5452
"
5553

56-
# Creates the `.env` file if it does not exist.
57-
# The `.env` file must be checked via the shell
58-
# as is cannot be made into a Makefile target.
59-
# This would make it impossible to reference it in the `include` command.
60-
env:
54+
# The `.env` file must be checked via shell as is cannot be a Makefile target.
55+
# Doing so would make it impossible to reference `.env` in the `-include` command.
56+
env: # Creates the `.env` file if it does not exist.
6157
@test -f ${ENV_FILE} || printf ${ENV_TEXT} >> ${ENV_FILE}
6258

6359
check: # Checks if the `.env` file exists.

README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,6 @@ USR=USERNAME
182182
PROJECT=train-username # `PROJECT` must be in lowercase.
183183
SERVICE=train
184184
COMMAND=/bin/zsh # Command to execute on starting the container.
185-
HOST_NAME=train
186185
IMAGE_NAME=cresset:train-USERNAME
187186
PROJECT_ROOT=/opt/project
188187

docker-compose.yaml

Lines changed: 57 additions & 101 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,24 @@
1-
# Requires the Docker Compose V2.
1+
# Requires Docker Compose V2.
22
# See https://docs.docker.com/compose/compose-file/compose-file-v3
33
# and https://github.com/compose-spec/compose-spec/blob/master/spec.md
44
# for details concerning the `docker-compose.yaml` file syntax.
55

6-
# Variables are in ${VARIABLE:-DEFAULT_VALUE} format
7-
# to ensure that default values are given to the Dockerfile.
8-
# Using a `.env` file to set variables is strongly recommended.
9-
# However, note that variables in the host shell have
10-
# higher priority than the `.env` file for Docker Compose.
11-
12-
# Run `make env` to create a basic `.env` file with the UID and GID variables.
13-
# Compute Capability must be specified via the `CCA` variable.
14-
15-
# Using a `docker-compose.yaml` file has many advantages
16-
# over creating custom shell scripts for each project.
6+
# Using `docker-compose.yaml` has many advantages over writing custom shell scripts.
177
# The settings are much easier to see and maintain than scattered shell scripts.
188
# Also, Compose is a native Docker component, simplifying project maintenance.
199

20-
# Set the host environment variable `BUILDKIT_PROGRESS=plain` to see the full build log.
21-
# https://github.com/docker/cli/blob/master/docs/reference/commandline/cli.md#environment-variables
10+
# Run `make env` to create a basic `.env` file with the UID and GID variables.
11+
# Using a `.env` file to set variables is strongly recommended. However,
12+
# variables in the host shell have higher priority than `.env` for Docker Compose.
13+
# Variables are in ${VARIABLE:-DEFAULT_VALUE} format to specify default values.
2214

2315
# See https://pytorch.org/docs/stable/cpp_extension.html for an
2416
# explanation of how to specify the `TORCH_CUDA_ARCH_LIST` variable.
2517
# The variable `CCA` is used to specify `TORCH_CUDA_ARCH_LIST`.
18+
# Compute Capability must be specified via the `CCA` variable.
19+
20+
# Set the host environment variable `BUILDKIT_PROGRESS=plain` to see the full build log.
21+
# https://github.com/docker/cli/blob/master/docs/reference/commandline/cli.md#environment-variables
2622

2723
networks: # Use the host network instead of creating a separate network.
2824
default: # This reduces load and conflicts with the host network.
@@ -35,22 +31,50 @@ services:
3531
init: true # Equivalent to `--init` flag in `docker run`.
3632
stdin_open: true # equivalent to `-i` flag in `docker run`.
3733
working_dir: ${PROJECT_ROOT:-/opt/project}
34+
user: ${UID:-1000}:${GID:-1000} # Specify USR/GRP at runtime.
3835
# Use different image names for different users and projects.
3936
# Otherwise, images will be repeatedly removed and recreated.
4037
# The removed images will remain cached, however.
4138
image: ${IMAGE_NAME}
4239
network_mode: host # Use the same network as the host, may cause security issues.
4340
# `ipc: host` removes the shared memory cap but is a known security vulnerability.
44-
# ipc: host # Equivalent to `--ipc=host` in `docker run`. Disable this on WSL.
41+
# ipc: host # Equivalent to `--ipc=host` in `docker run`. **Disable this on WSL.**
4542
# shm_size: 1GB # Explicit shared memory limit. No security issues this way.
46-
environment: # Common runtime environment variables.
43+
hostname: ${SERVICE} # Makes `pure` terminals easier to tell apart.
44+
extra_hosts: # Prevents "unknown host" issue when using `sudo`.
45+
- "${SERVICE}:127.0.0.1"
46+
47+
# Common environment variables for the container runtime. No effect on build.
48+
environment: # Equivalent to `--env`
4749
CUDA_DEVICE_ORDER: PCI_BUS_ID
50+
HISTSIZE: 50000 # Hard-coded large command history size.
4851
TZ: ${TZ:-Asia/Seoul} # Timezone settings used during runtime.
52+
# tmpfs: # Create directory in RAM for fast data IO.
53+
# - /opt/data
54+
# Default volume pairings of ${HOST_PATH}:${CONTAINER_PATH}.
55+
# Allows the container to access `HOST_PATH` as `CONTAINER_PATH`.
56+
# See https://docs.docker.com/storage/volumes for details.
57+
# Always use the ${HOME} variable to specify the host home directory.
58+
# See https://github.com/docker/compose/issues/6506 for details.
59+
volumes: # Equivalent to `-v` flag in `docker run`.
60+
# Current working directory `.` is connected to `PROJECT_ROOT`.
61+
# Mount `.` if the docker-compose.yaml file is at the project root.
62+
# Mount `..` if Cresset is a subdirectory in a different project, etc.
63+
- .:${PROJECT_ROOT:-/opt/project}
64+
# Preserve VSCode extensions between containers.
65+
# Assumes default VSCode server directory.
66+
# May cause VSCode issues if multiple Cresset-based projects are on the
67+
# same machine writing to the `${HOME}/.vscode-server` directory.
68+
# If so, specify a different host directory for each project.
69+
- ${HOME}/.vscode-server:/home/${USR:-user}/.vscode-server
70+
4971
build:
5072
context: . # Nearly all files are ignored due to `.dockerignore` settings.
73+
target: ${TARGET_STAGE:-train} # Specify Dockerfile target build stage.
5174
args: # Common build-time environment variables.
5275
# Even if these variables are unnecessary during the build,
5376
# they can be ignored simply by not defining them in that stage.
77+
INTERACTIVE_MODE: ${INTERACTIVE_MODE:-include}
5478
PROJECT_ROOT: ${PROJECT_ROOT:-/opt/project}
5579
GID: ${GID:-1000}
5680
UID: ${UID:-1000}
@@ -66,37 +90,13 @@ services:
6690
capabilities: [gpu]
6791
# device_ids: [ "0" ] # Use only GPU 0.
6892

69-
train: # Default service name. Change the name for each project.
93+
train:
7094
extends:
7195
service: base
72-
# Set to the service name. Makes terminals easier to tell apart.
73-
# `HOST_NAME` avoids conflict with the `HOSTNAME` shell builtin variable.
74-
hostname: ${HOST_NAME:-train}
75-
extra_hosts:
76-
- "${HOST_NAME:-train}:127.0.0.1" # Prevents "unknown host" issue when using `sudo`.
77-
user: ${UID:-1000}:${GID:-1000}
78-
environment: # Environment variables for the container, not the build. Equivalent to `--env`
79-
HISTSIZE: 50000 # Hard-coded large command history size.
80-
# Setting `HOST_PATH:CONTAINER_PATH`
81-
# allows the container to access `HOST_PATH` as `CONTAINER_PATH`.
82-
# See https://docs.docker.com/storage/volumes for details.
83-
# Current working directory `.` is connected to `PROJECT_ROOT`.
84-
# Always use the ${HOME} variable to specify the host home directory.
85-
# See https://github.com/docker/compose/issues/6506 for details.
86-
volumes: # Equivalent to `-v` flag in `docker run`.
87-
# Use this if the docker-compose.yaml file is at the project root.
88-
- .:${PROJECT_ROOT:-/opt/project}
89-
# Preserve VSCode extensions between containers.
90-
# Assumes default VSCode server directory.
91-
- ${HOME}/.vscode-server:/home/${USR:-user}/.vscode-server
92-
# tmpfs: # Create directory in RAM for fast data IO.
93-
# - /opt/data
9496
build: # Options for building. Used when `--build` is called in `docker compose`.
9597
# Set `TARGET_STAGE` to `train-builds` to get just the wheels in `/tmp/dist`.
96-
target: ${TARGET_STAGE:-train} # Specify build target.
9798
dockerfile: Dockerfile
9899
args: # Equivalent to `--build-arg`.
99-
INTERACTIVE_MODE: ${INTERACTIVE_MODE:-include}
100100
BUILD_MODE: ${BUILD_MODE:-exclude}
101101
BUILD_TEST: 1 # Enable tests to have identical configurations with deployment.
102102
USE_NNPACK: 0
@@ -110,111 +110,67 @@ services:
110110
MKL_MODE: ${MKL_MODE:-include} # MKL_MODE can be `include` or `exclude`.
111111
# Change the `CONDA_URL` for different hardware architectures.
112112
# URLs from https://github.com/conda-forge/miniforge are recommended over
113-
# Miniconda URLs from https://docs.conda.io/en/latest/miniconda.html
114-
# `CONDA_MANAGER` may be either `mamba` (the default) or `conda`.
115-
# Mamba is a faster reimplementation of conda in C++
116-
# However, there are occasions where mamba is unable to
117-
# resolve conflicts that conda can resolve.
113+
# Miniconda URLs from https://docs.conda.io/en/latest/miniconda.html.
114+
# The `CONDA_MANAGER` may be either `mamba` (the default) or `conda`.
115+
# However, `mamba` may be unable to resolve conflicts that `conda` can.
118116
# In such cases, set `CONDA_MANAGER=conda` for conda-based installation.
119-
# Note that installing Mamba via Mambaforge is strongly recommended.
117+
# Installing `mamba` via Mambaforge is strongly recommended.
120118
CONDA_URL: ${CONDA_URL:-https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh}
121119
CONDA_MANAGER: ${CONDA_MANAGER:-mamba}
122120
# Fails if `BUILD_MODE=include` but `CCA` is not set explicitly.
123-
TORCH_CUDA_ARCH_LIST: ${CCA}
124-
# Variables for building PyTorch. Must be valid git tags.
121+
TORCH_CUDA_ARCH_LIST: ${CCA} # Ignore the missing CCA warning otherwise.
122+
# Variables for building PyTorch. Must be valid git tags or commits.
125123
PYTORCH_VERSION_TAG: ${PYTORCH_VERSION_TAG:-v2.0.0}
126124
TORCHVISION_VERSION_TAG: ${TORCHVISION_VERSION_TAG:-v0.15.1}
127125
# Variables for downloading PyTorch instead of building.
128126
PYTORCH_INDEX_URL: ${PYTORCH_INDEX_URL:-https://download.pytorch.org/whl/cu118}
129127
PYTORCH_VERSION: ${PYTORCH_VERSION:-2.0.0}
130128
TORCHVISION_VERSION: ${TORCHVISION_VERSION:-0.15.1}
131-
# URL for faster `apt` and `pip` installs. Optimized for Korean users.
132-
# Use URLs optimized for user location and security requirements.
129+
# URLs for faster `apt` and `pip` installs. Comment out to use the defaults.
130+
# Use URLs optimized for location and security requirements.
133131
# DEB_OLD: ${DEB_OLD:-http://archive.ubuntu.com}
134132
# DEB_NEW: ${DEB_NEW:-http://mirror.kakao.com}
135-
# Comment out the PyPI mirrors to use the default PyPI repository.
136133
# INDEX_URL: ${INDEX_URL:-http://mirror.kakao.com/pypi/simple}
137134
# TRUSTED_HOST: ${TRUSTED_HOST:-mirror.kakao.com}
138135

139-
# This layer may be useful for PyTorch contributors.
140136
devel: # Skeleton service for development and debugging.
141-
extends:
137+
extends: # This service may be useful for PyTorch CUDA/C++ contributors.
142138
service: base
143-
hostname: ${HOST_NAME:-devel}
144-
extra_hosts:
145-
- "${HOST_NAME:-devel}:127.0.0.1"
146-
volumes:
147-
- .:${PROJECT_ROOT:-/opt/project}
148139
build:
149140
target: ${TARGET_STAGE:-build-base} # All builds begin at `build-base`.
150141
dockerfile: Dockerfile
151142

152-
# Service based on images from the NGC PyTorch image catalog. Visit
153-
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/running.html
154-
# for an up-to-date list of all NVIDIA NGC PyTorch images.
155-
# Note that the NGC images are very unstable, with many differences between versions.
156-
# This service may break for different `NGC_YEAR` and `NGC_MONTH` configurations.
157-
ngc:
143+
ngc: # Service based on images from the NGC PyTorch image catalog.
144+
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/running.html
145+
# NGC images are very unstable, with many differences between versions.
146+
# This service may break for different `NGC_YEAR` and `NGC_MONTH` values.
158147
extends:
159148
service: base
160-
user: ${UID:-1000}:${GID:-1000}
161-
environment:
162-
HISTSIZE: 50000 # Hard-coded large command history size.
163-
volumes:
164-
- .:${PROJECT_ROOT:-/opt/project}
165-
hostname: ${HOST_NAME:-ngc}
166-
extra_hosts:
167-
- "${HOST_NAME:-ngc}:127.0.0.1"
168149
build:
169-
target: ${TARGET_STAGE:-train}
170150
dockerfile: dockerfiles/ngc.Dockerfile
171151
args:
172152
NGC_YEAR: ${NGC_YEAR:-23}
173153
NGC_MONTH: ${NGC_MONTH:-03}
174-
INTERACTIVE_MODE: ${INTERACTIVE_MODE:-include}
175154

176-
# Service based on the official PyTorch Docker images from Docker Hub. Visit
177-
# https://hub.docker.com/r/pytorch/pytorch/tags to find available images.
178-
hub:
179-
extends:
155+
hub: # Service based on the official PyTorch Docker images from Docker Hub.
156+
extends: # Available images: https://hub.docker.com/r/pytorch/pytorch/tags
180157
service: base
181-
user: ${UID:-1000}:${GID:-1000}
182-
environment:
183-
HISTSIZE: 50000 # Hard-coded large command history size.
184-
volumes:
185-
- .:${PROJECT_ROOT:-/opt/project}
186-
hostname: ${HOST_NAME:-hub}
187-
extra_hosts:
188-
- "${HOST_NAME:-hub}:127.0.0.1"
189158
build:
190-
target: ${TARGET_STAGE:-train}
191159
dockerfile: dockerfiles/hub.Dockerfile
192160
args:
193161
PYTORCH_VERSION: ${PYTORCH_VERSION:-2.0.0}
194162
# Note that `CUDA_SHORT_VERSION` excludes the patch version numbers.
195163
CUDA_SHORT_VERSION: ${CUDA_SHORT_VERSION:-11.7}
196164
CUDNN_VERSION: ${CUDNN_VERSION:-8}
197165
IMAGE_FLAVOR: ${IMAGE_FLAVOR:-devel}
198-
INTERACTIVE_MODE: ${INTERACTIVE_MODE:-include}
199166

200-
# Service installed purely from official/verified Docker images and `conda`.
201-
simple:
167+
simple: # Service installed purely from official/verified Docker images and `conda`.
202168
extends:
203169
service: base
204-
user: ${UID:-1000}:${GID:-1000}
205-
environment:
206-
HISTSIZE: 50000 # Hard-coded large command history size.
207-
volumes:
208-
- .:${PROJECT_ROOT:-/opt/project}
209-
hostname: ${HOST_NAME:-simple}
210-
extra_hosts:
211-
- "${HOST_NAME:-simple}:127.0.0.1"
212170
build:
213-
target: ${TARGET_STAGE:-train}
214171
dockerfile: dockerfiles/simple.Dockerfile
215172
args:
216173
BASE_IMAGE: ${LINUX_DISTRO:-ubuntu}:${DISTRO_VERSION:-22.04}
217-
INTERACTIVE_MODE: ${INTERACTIVE_MODE:-include}
218174
LOCK_MODE: ${LOCK_MODE:-exclude}
219175
CONDA_URL: ${CONDA_URL:-https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh}
220176
CONDA_MANAGER: ${CONDA_MANAGER:-mamba}

dockerfiles/ngc.Dockerfile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ ENV PYTHONIOENCODING=UTF-8
3131
ARG PYTHONDONTWRITEBYTECODE=1
3232
ARG PYTHONUNBUFFERED=1
3333

34+
# The base NGC image sets `SHELL=bash`. Docker cannot unset an `ENV` variable,
35+
# ergo, `SHELL=''` is used for best compatibility with the other services.
36+
ENV SHELL=''
37+
3438
# Install `apt` requirements.
3539
# `tzdata` requires noninteractive mode.
3640
ARG DEBIAN_FRONTEND=noninteractive

reqs/hub-apt.requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
libjemalloc-dev
21
sudo
32
tmux
43
tzdata

reqs/simple-environment.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ channels:
99
- conda-forge # Always use conda-forge instead.
1010
- nvidia # CUDA-related packages are available in the NVIDIA channel.
1111
dependencies: # Use conda packages if possible.
12-
- intel::python==3.10
12+
- python==3.10
1313
- pytorch::pytorch # Only install PyTorch-related packages from the PyTorch channel.
1414
- pytorch::torchvision
1515
- pytorch::pytorch-cuda==11.8
1616
- jemalloc
17-
- intel::mkl
18-
- intel::numpy # Use Numpy built with the Intel compiler for best performance with MKL.
17+
- mkl
18+
- numpy
1919
- pytest
2020
- tmux==3.2a
2121
- tqdm

reqs/train-apt.requirements.txt

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
1-
# Example `apt` requirements file.
2-
# `sudo` and `zsh` are required packages.
1+
# Example `apt` requirements file. `sudo` and `zsh` are required packages.
32
at
43
numactl
54
sudo
6-
watchman # For pyre-check only.
75
zsh

0 commit comments

Comments
 (0)