This is the artifact for the Futhark AD submission to SC22; this repository contains all source code and data required to reproduce figures 6/7/8/9/11/12/13 in the Futhark AD submission to SC22, "AD for an Array Language with Nested Parallelism".
The artifact is distributed as two a Docker containers, one for running on NVIDIA GPUs (CUDA) and one on AMD GPUs (ROCm). The artifact can also be run without Docker, although there are many dependencies. A section below describes how to do this.
- An x86_64 CPU.
- A modern NVIDIA GPU or AMD GPU; the benchmarks in the paper were peformed with an A100 and a 2080 Ti (on a few select benchmarks) on the NVIDIA side and an MI100 on the AMD side. Most of the benchmarks require large amounts of video memory (up to 30 GiB). These benchmarks will fail on GPUs with an insufficient amount of memory; in these cases the table corresponding to the benchmark will not be reproduced.
- ~20 GiB of free disk space.
Running the container requires an x86_64 system running a Linux distribution with support for Docker. For the NVIDIA container, the NVIDIA Container Toolkit is also necessary; please see NVIDIA's installation guide for exact requirements and installation instructions.
The suggested way to obtain the container image is to pull from the GitHub Container Registry. For the CUDA container, run
docker pull ghcr.io/diku-dk/futhark-ad-sc22:cuda
and for the ROCm container, run
docker pull ghcr.io/diku-dk/futhark-ad-sc22:rocm
Note: you may need to run docker as root with sudo
.
Alternatively, the container images may be built by running
docker build -t ghcr.io/diku-dk/futhark-ad-sc22:[cuda|rocm] -f Dockerfile.[cuda|rocm] .
within the root directory of this repository. Note that this will likely take a long time and that the Dockerfile is not deterministic; it is strongly recommended to pull from the GitHub Container Registry.
The CUDA container may be run interactively with
docker run --rm -it --gpus all ghcr.io/diku-dk/futhark-ad-sc22:cuda
and the ROCm container may be run interactively with
docker run --rm -it --device=/dev/kfd --device=/dev/dri --group-add video ghcr.io/diku-dk/futhark-ad-sc22:rocm
To reproduce the figures, run the corresponding make
command in the
/home/bench/futhark-ad-sc22
directory of the Docker container (this
is the default working directory):
-
Figure 6:
make figure_6
. -
Figure 7:
make figure_7
. Note that the Enzyme overheads are not computed by this artifact, but simply copied from the Enzyme paper for reference. -
Figure 8:
make figure_8
. Note that Figure 8 requires ~30 GiB of video memory to reproduce. -
Figure 9:
make figure_9
. -
Figure 11:
make figure_11
. -
Figures 12 and 13:
make figure_12_13
.
Note: Some of the figures in the paper contain results from multiple machines. The commands above only produce results for a single machine (the one you are running on). A comparison between machines is not a contribution of the paper, so the artifact doesn't deal with it.
It is possible to run the artifact without using the Docker container, although it is somewhat intricate. If any of the below seems wrong or confusing, you can always peruse Dockerfile.cuda or Dockerfile.rocm to see how the containers themselves are constructed.
You need the following components:
-
Git Large File Storage (available in many package managers).
-
A working CUDA or OpenCL on your system. The environment variables
LIBRARY_PATH
,LD_LIBRARY_PATH
, andCPATH
must be set such that including and linking against OpenCL/CUDA libraries works. On most systems with CUDA this means:$ export LIBRARY_PATH=/usr/local/cuda/lib64:$LIBRARY_PATH $ export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:$LD_LIBRARY_PATH $ export CPATH=/usr/local/cuda/include:$CPATH
(Note that some systems installs CUDA in weird locations.) On ROCm, this means:
$ export CPATH=/opt/rocm/include:/opt/rocm/opencl/include:$CPATH $ export C_INCLUDE_PATH=/opt/rocm/include:/opt/rocm/opencl/include:$C_INCLUDE_PATH $ export LIBRARY_PATH=/opt/rocm/lib:/opt/rocm/opencl/lib:$LIBRARY_PATH $ export LD_LIBRARY_PATH=/opt/rocm/lib:/opt/rocm/opencl/lib:$LD_LIBRARY_PATH $ export CPLUS_INCLUDE_PATH=/opt/rocm/include:/opt/rocm/opencl/lib:$CPLUS_INCLUDE_PATH
-
Python 3.8 and
pip
(available in basically all package managers). -
Several Python packages. On CUDA, these are installable with:
$ pip3 install --upgrade torch==1.11.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 $ pip3 install --upgrade "jax[cuda]==0.3.4" -f https://storage.googleapis.com/jax-releases/jax_releases.html $ pip3 install futhark-data prettytable
-
On ROCm, with:
$ pip3 install numpy==1.22.3 scipy==1.8.0 futhark-data prettytable $ pip3 install torch==1.11.0 -f https://download.pytorch.org/whl/rocm4.2/torch_stable.html
Clone this repository and then initialise the submodules containing ADBench and the Futhark compiler itself:
$ git lfs install # If you have not used git-lfs before.
$ git clone https://github.com/diku-dk/futhark-ad-sc22
$ cd futhark-ad-sc22
$ git submodule update --init ADBench
$ git submodule update --init futhark
Optional: recompile the Futhark compiler binary (but the included
one works as well): make -B bin/futhark
.
Before running, you must set the environment variable GPU
to either
A100
if you have an NVIDIA GPU, or MI100
if you have an AMD GPU:
export GPU=[A100|MI100]
You must also set the PYTHON
environment variable to point to
your python binary:
export PYTHON=python3.8
After this, you should be able to use the Makefile targets listed above to reproduce the individual tables.
This section describes every top-level directory and its purpose.
-
ADBench/
: a Git submodule containing a fork of the main ADBench repository, with Futhark implementations added. We use only a small amount of ADBench, but it is simpler to include all of it than to try to exfiltrate the pertinent parts. The Futhark implementations reside inADBench/src/cpp/modules/futhark
. -
benchmarks/
: contains the source code and data for all benchmarks except for the ADBench benchmark. -
bin/
: precompiled binaries and scripts used in the artifact. -
futhark/
: a Git submodule containing the Futhark compiler extended with support for AD. This is the compiler used for the artifact, and can be used to (re)produce thebin/futhark
executable withmake bin/futhark -B
. -
originals/
: original sources for thelbm
,rsbench
andxsbench
benchmarks to compare against. -
tmp/
: used for storing raw results from running some of the benchmarks. (ADBench contains its own temporary directory, select other benchmarks store results in other folders.)