Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions containers/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Using GPUs on CHTC via Linux containers

## Docker

Docker is software that helps bundle software programs, libraries and
dependencies in a package called a **container**.
Once built into container images, these containers can be run on different
machines that have the Docker Engine.
Programs with complex dependencies are often packaged with Docker and made
available for download on [DockerHub](https://hub.docker.com).

The Docker Engine needs special configuration to give the software inside a
container access to a GPU. CHTC does this behind the scenes with
`nvidia-docker`. Any Docker container that wants to use `nvidia-docker` must
contain the Nvidia CUDA toolkit inside it. Here we have working examples and
also some pointers on how to find Docker container images or build your own
Docker container images that can access the GPU.

### Examples

1. **Hello\_GPU**
This is a simple example to see if we can access the GPU from inside a Docker
container on CHTC. It uses the
[nvidia/cuda](https://hub.docker.com/r/nvidia/cuda) Docker container image which
is a tiny container that only contains the Nvidia CUDA toolkit.
[Click here to access this example](./hello_gpu/).

2. **Matrix Multiplication with TensorFlow (Python)**
This example uses a [TensorFlow](https://www.tensorflow.org) [Docker
container](https://hub.docker.com/r/tensorflow/tensorflow/) to benchmark matrix
multiplication on a GPU vs the same matrix multiplication on a CPU.
[Click here to access this example](./tensorflow_python/).

3. **Convolutional Neural Network with PyTorch (Python)**
This example shows how to send training and test data to the compute node
along with the script. After processing the trained network is returned to the
submit node.
[Click here to access this example](./pytorch_python/).

### Finding container images
1. Pick a Docker container image that is built on a more modern version of
CUDA Toolkit.
Although the toolkits are backwards compatible, the more modern the toolkit,
the less likely you are to run into problems.
2. [Nvidia Catalog](https://ngc.nvidia.com/catalog/landing) has a good
selection of container images that use the GPU for machine learning, inference,
visualization etc. They need to be uploaded to your own account on Dockerhub
before being used. This can be done with the Docker application or with the
Docker Automated Builder (see below).
3. [Rocker](https://hub.docker.com/u/rocker) is a great place to find GPU
enabled machine learning software for the [R Project for Statistical
Computing](https://www.r-project.org)


### Building container images
Building your own container images to access a GPU requires a bit of work and
will not be described fully here.
It is best to start with a basic Docker container image that can access the GPU
and then build upon that image.
The PyTorch Docker container image is built on top of Nvidia Cuda and is a
[good example to follow](https://github.com/pytorch/pytorch/blob/main/.devcontainer/Dockerfile).

```Dockerfile
FROM nvidia/cuda:10.1-base-ubuntu18.04
#....
```
or
```Dockerfile
# Pull from Nvidia's catalog
FROM nvcr.io/nvidia/pytorch:19.07-py3

# conda is already installed so just install packages
RUN conda install package_1 package_2 package_etc
```

Once you have a working `Dockerfile`, you need to build a Docker container image
with the Docker app and then upload it to Dockerhub so that CHTC can access your
container image.
Alternatively, you can have the DockerHub Cloud service directly build it for
you on DockerHub.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
72 changes: 0 additions & 72 deletions docker/README.md

This file was deleted.

2 changes: 1 addition & 1 deletion shared/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ Nothing here is intended to be run on its own.
To find runnable examples, navigate to one of the following subdirectories:

[`conda`](../conda),
[`docker`](../docker),
[`containers`](../containers),
or [`test`](../test)