diff --git a/containers/README.md b/containers/README.md new file mode 100644 index 0000000..d771004 --- /dev/null +++ b/containers/README.md @@ -0,0 +1,80 @@ +# Using GPUs on CHTC via Linux containers + +## Docker + +Docker is software that helps bundle software programs, libraries and +dependencies in a package called a **container**. +Once built into container images, these containers can be run on different +machines that have the Docker Engine. +Programs with complex dependencies are often packaged with Docker and made +available for download on [DockerHub](https://hub.docker.com). + +The Docker Engine needs special configuration to give the software inside a +container access to a GPU. CHTC does this behind the scenes with +`nvidia-docker`. Any Docker container that wants to use `nvidia-docker` must +contain the Nvidia CUDA toolkit inside it. Here we have working examples and +also some pointers on how to find Docker container images or build your own +Docker container images that can access the GPU. + +### Examples + +1. **Hello\_GPU** + This is a simple example to see if we can access the GPU from inside a Docker +container on CHTC. It uses the +[nvidia/cuda](https://hub.docker.com/r/nvidia/cuda) Docker container image which +is a tiny container that only contains the Nvidia CUDA toolkit. + [Click here to access this example](./hello_gpu/). + +2. **Matrix Multiplication with TensorFlow (Python)** + This example uses a [TensorFlow](https://www.tensorflow.org) [Docker +container](https://hub.docker.com/r/tensorflow/tensorflow/) to benchmark matrix +multiplication on a GPU vs the same matrix multiplication on a CPU. + [Click here to access this example](./tensorflow_python/). + +3. **Convolutional Neural Network with PyTorch (Python)** + This example shows how to send training and test data to the compute node +along with the script. After processing the trained network is returned to the +submit node. + [Click here to access this example](./pytorch_python/). + +### Finding container images +1. Pick a Docker container image that is built on a more modern version of +CUDA Toolkit. +Although the toolkits are backwards compatible, the more modern the toolkit, +the less likely you are to run into problems. +2. [Nvidia Catalog](https://ngc.nvidia.com/catalog/landing) has a good + selection of container images that use the GPU for machine learning, inference, +visualization etc. They need to be uploaded to your own account on Dockerhub +before being used. This can be done with the Docker application or with the +Docker Automated Builder (see below). +3. [Rocker](https://hub.docker.com/u/rocker) is a great place to find GPU + enabled machine learning software for the [R Project for Statistical +Computing](https://www.r-project.org) + + +### Building container images +Building your own container images to access a GPU requires a bit of work and +will not be described fully here. +It is best to start with a basic Docker container image that can access the GPU +and then build upon that image. +The PyTorch Docker container image is built on top of Nvidia Cuda and is a +[good example to follow](https://github.com/pytorch/pytorch/blob/main/.devcontainer/Dockerfile). + +```Dockerfile +FROM nvidia/cuda:10.1-base-ubuntu18.04 +#.... +``` +or +```Dockerfile +# Pull from Nvidia's catalog +FROM nvcr.io/nvidia/pytorch:19.07-py3 + +# conda is already installed so just install packages +RUN conda install package_1 package_2 package_etc +``` + +Once you have a working `Dockerfile`, you need to build a Docker container image +with the Docker app and then upload it to Dockerhub so that CHTC can access your +container image. +Alternatively, you can have the DockerHub Cloud service directly build it for +you on DockerHub. diff --git a/docker/hello_gpu/README.md b/containers/hello_gpu/README.md similarity index 100% rename from docker/hello_gpu/README.md rename to containers/hello_gpu/README.md diff --git a/docker/hello_gpu/expected_output/docker_stderror b/containers/hello_gpu/expected_output/docker_stderror similarity index 100% rename from docker/hello_gpu/expected_output/docker_stderror rename to containers/hello_gpu/expected_output/docker_stderror diff --git a/docker/hello_gpu/expected_output/hello_gpu.err.txt b/containers/hello_gpu/expected_output/hello_gpu.err.txt similarity index 100% rename from docker/hello_gpu/expected_output/hello_gpu.err.txt rename to containers/hello_gpu/expected_output/hello_gpu.err.txt diff --git a/docker/hello_gpu/expected_output/hello_gpu.log.txt b/containers/hello_gpu/expected_output/hello_gpu.log.txt similarity index 100% rename from docker/hello_gpu/expected_output/hello_gpu.log.txt rename to containers/hello_gpu/expected_output/hello_gpu.log.txt diff --git a/docker/hello_gpu/expected_output/hello_gpu.out.txt b/containers/hello_gpu/expected_output/hello_gpu.out.txt similarity index 100% rename from docker/hello_gpu/expected_output/hello_gpu.out.txt rename to containers/hello_gpu/expected_output/hello_gpu.out.txt diff --git a/docker/hello_gpu/hello_gpu.sh b/containers/hello_gpu/hello_gpu.sh similarity index 100% rename from docker/hello_gpu/hello_gpu.sh rename to containers/hello_gpu/hello_gpu.sh diff --git a/docker/hello_gpu/hello_gpu.sub b/containers/hello_gpu/hello_gpu.sub similarity index 100% rename from docker/hello_gpu/hello_gpu.sub rename to containers/hello_gpu/hello_gpu.sub diff --git a/docker/pytorch_ngc/README.md b/containers/pytorch_ngc/README.md similarity index 100% rename from docker/pytorch_ngc/README.md rename to containers/pytorch_ngc/README.md diff --git a/docker/pytorch_ngc/pytorch_cnn.sh b/containers/pytorch_ngc/pytorch_cnn.sh similarity index 100% rename from docker/pytorch_ngc/pytorch_cnn.sh rename to containers/pytorch_ngc/pytorch_cnn.sh diff --git a/docker/pytorch_ngc/pytorch_cnn.sub b/containers/pytorch_ngc/pytorch_cnn.sub similarity index 100% rename from docker/pytorch_ngc/pytorch_cnn.sub rename to containers/pytorch_ngc/pytorch_cnn.sub diff --git a/docker/pytorch_python/README.md b/containers/pytorch_python/README.md similarity index 100% rename from docker/pytorch_python/README.md rename to containers/pytorch_python/README.md diff --git a/docker/pytorch_python/expected_output/docker_stderror b/containers/pytorch_python/expected_output/docker_stderror similarity index 100% rename from docker/pytorch_python/expected_output/docker_stderror rename to containers/pytorch_python/expected_output/docker_stderror diff --git a/docker/pytorch_python/expected_output/mnist_cnn.pt b/containers/pytorch_python/expected_output/mnist_cnn.pt similarity index 100% rename from docker/pytorch_python/expected_output/mnist_cnn.pt rename to containers/pytorch_python/expected_output/mnist_cnn.pt diff --git a/docker/pytorch_python/expected_output/pytorch_cnn.err.txt b/containers/pytorch_python/expected_output/pytorch_cnn.err.txt similarity index 100% rename from docker/pytorch_python/expected_output/pytorch_cnn.err.txt rename to containers/pytorch_python/expected_output/pytorch_cnn.err.txt diff --git a/docker/pytorch_python/expected_output/pytorch_cnn.log.txt b/containers/pytorch_python/expected_output/pytorch_cnn.log.txt similarity index 100% rename from docker/pytorch_python/expected_output/pytorch_cnn.log.txt rename to containers/pytorch_python/expected_output/pytorch_cnn.log.txt diff --git a/docker/pytorch_python/expected_output/pytorch_cnn.out.txt b/containers/pytorch_python/expected_output/pytorch_cnn.out.txt similarity index 100% rename from docker/pytorch_python/expected_output/pytorch_cnn.out.txt rename to containers/pytorch_python/expected_output/pytorch_cnn.out.txt diff --git a/docker/pytorch_python/pytorch_cnn.sh b/containers/pytorch_python/pytorch_cnn.sh similarity index 100% rename from docker/pytorch_python/pytorch_cnn.sh rename to containers/pytorch_python/pytorch_cnn.sh diff --git a/docker/pytorch_python/pytorch_cnn.sub b/containers/pytorch_python/pytorch_cnn.sub similarity index 100% rename from docker/pytorch_python/pytorch_cnn.sub rename to containers/pytorch_python/pytorch_cnn.sub diff --git a/docker/tensorflow_python/README.md b/containers/tensorflow_python/README.md similarity index 100% rename from docker/tensorflow_python/README.md rename to containers/tensorflow_python/README.md diff --git a/docker/tensorflow_python/expected_output/docker_stderror b/containers/tensorflow_python/expected_output/docker_stderror similarity index 100% rename from docker/tensorflow_python/expected_output/docker_stderror rename to containers/tensorflow_python/expected_output/docker_stderror diff --git a/docker/tensorflow_python/expected_output/tensorflow_gpu.err.txt b/containers/tensorflow_python/expected_output/tensorflow_gpu.err.txt similarity index 100% rename from docker/tensorflow_python/expected_output/tensorflow_gpu.err.txt rename to containers/tensorflow_python/expected_output/tensorflow_gpu.err.txt diff --git a/docker/tensorflow_python/expected_output/tensorflow_gpu.log.txt b/containers/tensorflow_python/expected_output/tensorflow_gpu.log.txt similarity index 100% rename from docker/tensorflow_python/expected_output/tensorflow_gpu.log.txt rename to containers/tensorflow_python/expected_output/tensorflow_gpu.log.txt diff --git a/docker/tensorflow_python/expected_output/tensorflow_gpu.out.txt b/containers/tensorflow_python/expected_output/tensorflow_gpu.out.txt similarity index 100% rename from docker/tensorflow_python/expected_output/tensorflow_gpu.out.txt rename to containers/tensorflow_python/expected_output/tensorflow_gpu.out.txt diff --git a/docker/tensorflow_python/test_tensorflow.py b/containers/tensorflow_python/test_tensorflow.py similarity index 100% rename from docker/tensorflow_python/test_tensorflow.py rename to containers/tensorflow_python/test_tensorflow.py diff --git a/docker/tensorflow_python/test_tensorflow.sh b/containers/tensorflow_python/test_tensorflow.sh similarity index 100% rename from docker/tensorflow_python/test_tensorflow.sh rename to containers/tensorflow_python/test_tensorflow.sh diff --git a/docker/tensorflow_python/test_tensorflow.sub b/containers/tensorflow_python/test_tensorflow.sub similarity index 100% rename from docker/tensorflow_python/test_tensorflow.sub rename to containers/tensorflow_python/test_tensorflow.sub diff --git a/docker/README.md b/docker/README.md deleted file mode 100644 index b014c8d..0000000 --- a/docker/README.md +++ /dev/null @@ -1,72 +0,0 @@ -### Using GPUs on CHTC via Docker - -Docker is software that helps bundle software programs, libraries and -dependencies in a package called a **container**. Once built, these containers -can be run on different machines that have the Docker Engine. Programs with -complex dependencies are often packaged with Docker and made available for -download on [DockerHub](https://hub.docker.com). - -The Docker Engine needs special configuration to give the software inside a -container access to a GPU. CHTC does this behind the scenes with -`nvidia-docker`. Any Docker container that wants to use `nvidia-docker` must -contain the Nvidia CUDA toolkit inside it. Here we have working examples and -also some pointers on how to find containers or build your own containers that -can access the GPU. - - -### Examples - -1. **Hello\_GPU** - This is a simple example to see if we can access the GPU from inside a Docker -container on CHTC. It uses the -[nvidia/cuda](https://hub.docker.com/r/nvidia/cuda) Docker image which is a -tiny container that only contains the Nvidia CUDA toolkit. - [Click here to access this example](./hello_gpu/). - -2. **Matrix Multiplication with TensorFlow (Python)** - This example uses a [TensorFlow](https://www.tensorflow.org) [Docker -container](https://hub.docker.com/r/tensorflow/tensorflow/) to benchmark matrix -multiplication on a GPU vs the same matrix multiplication on a CPU. - [Click here to access this example](./tensorflow_python/). - -3. **Convolutional Neural Network with PyTorch (Python)** - This example shows how to send training and test data to the compute node -along with the script. After processing the trained network is returned to the -submit node. - [Click here to access this example](./pytorch_python/). - -### Finding containers -1. Pick a container that is built on a more modern version of CUDA Toolkit. Although the toolkits are backwards compatible, the more modern the toolkit, the less likely you are to run into problems. -2. [Nvidia Catalog](https://ngc.nvidia.com/catalog/landing) has a good - selection of containers that use the GPU for machine learning, inference, -visualization etc. They need to be uploaded to your own account on Dockerhub -before being used. This can be done with the Docker application or with the -Docker Automated Builder (see below). -3. [Rocker](https://hub.docker.com/u/rocker) is a great place to find GPU - enabled machine learning software for the [R Project for Statistical -Computing](https://www.r-project.org) - - -### Building containers -Building your own containers to access a GPU requires a bit of work and will -not be described fully here. It is best to start with a basic container that -can access the GPU and then build upon that container. The PyTorch Docker -container is built on top of Nvidia Cuda and is a [good example to follow](https://github.com/pytorch/pytorch/blob/main/.devcontainer/Dockerfile). - -```Dockerfile -FROM nvidia/cuda:10.1-base-ubuntu18.04 -#.... -``` -or -```Dockerfile -# Pull from Nvidia's catalog -FROM nvcr.io/nvidia/pytorch:19.07-py3 - -# conda is already installed so just install packages -RUN conda install package_1 package_2 package_etc -``` - -Once you have a working `Dockerfile`, you need to build a Docker container with -the Docker app and then upload it to Dockerhub so that CHTC can access your -container. Alternatively, you can have the DockerHub Cloud service directly -build it for you on DockerHub. diff --git a/shared/README.md b/shared/README.md index 3124ab8..bc70229 100644 --- a/shared/README.md +++ b/shared/README.md @@ -3,5 +3,5 @@ Nothing here is intended to be run on its own. To find runnable examples, navigate to one of the following subdirectories: [`conda`](../conda), -[`docker`](../docker), +[`containers`](../containers), or [`test`](../test)