Ping-pong with multiple GPUs and MPI

Implement a simple ping-pong test for GPU-to-GPU communication using: a) indirect communication via the host, and b) direct communication with HIP-aware MPI.

The ping-pong test consists of the following steps:

Send a vector from one GPU to another
The receiving GPU should increment all elements of the vector by one
Send the vector back to the original GPU

For reference, there is also a CPU-to-CPU implementation in the skeleton code (ping-pong.cpp). Timing of all tests is also included to compare the execution times.

On Lumi, one can compile the MPI example simply using the Cray compiler with

CC -xhip ping-pong.cpp

On LUMI, enable gpu-aware MPI on runtime (and compiling) by eexecuting:

MPICH_GPU_SUPPORT_ENABLED=1

For running, one should use two GPUs and two MPI processes:

srun --account=XXXXXX --partition=small-g -N1 -tasks-per-node=2 --cpus-per-task=1 --gpus-per-node=2 --time=00:15:00 ./a.out #  # The reservation is for small-g partition

On Mahti, to compile, just load the required modules and type make. A gpu-aware MPI is available with:

ml openmpi/4.1.4-cuda

For running, one should use two GPUs and two MPI processes:

srun --account=XXXXXX --partition=gputest -N1 -n2 --cpus-per-task=1 --gres=gpu:v100:2 --time=00:15:00 ./a.out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Ping-pong with multiple GPUs and MPI

Files

README.md

Latest commit

History

README.md

File metadata and controls

Ping-pong with multiple GPUs and MPI