Aurora Instructions

Using ACCLAiM on ANL Aurora

Argonne's Aurora supercomputer is the main target platform for ACCLAiM. If you run into any issues while following these instructions, please let us know!

Prerequisites

If you have not already, please first consult the repository's README to familiarize yourself with this software.

Python

ACCLAiM has been tested on Aurora with recent Python3 versions.

We recommend loading Python using the module system, such as:

module load python

To install ACCLAiM's Python dependencies on Aurora, we recommend setting up a local virtual environment. For example, you can go to ACCLAiM's root directory and run python3 -m venv .venv to setup a virtual environment using pip. To use the virtual environment, activate it (source .venv/bin/activate) and install the required packages listed in the README (pip install -r requirements.txt).

MPICH

MPICH is the default MPI implementation on Aurora, so you do not have to build/install it separately. If you wish to do so anyways, please follow the instructions found here. The default environment on Aurora, the path to the default MPICH implementation is stored in MPICH_ROOT. You can also find the path of the default installation by querying the system, e.g., which mpicc.

MPIEXEC

Aurora uses the mpiexec provided by Cray's PALS package. Its path can be found by querying the system, e.g., which mpiexec.

Tuning

Aurora's setup script works out-of-the-box for CPU and GPU tuning on Aurora.

Setup

Ensure that the dependencies are loaded in your environment, then run setup.py:

For GPU Tuning:

python3 setup.py $MPICH_ROOT aurora_xpu --launcher_path $(which mpiexec)

For CPU Tuning:

python3 setup.py $MPICH_ROOT aurora_cpu --launcher_path $(which mpiexec)

Note that the path to mpiexec must be provided separately because does not use the mpiexec provided by MPICH.

Examples

Once setup is complete, examples for how to use ACCLAiM on Aurora are provided in the main README and repeated here:

Tuning MPI_Allreduce on ANL Aurora:

NNODES=`wc -l < $PBS_NODEFILE`
PPN=12
PROCS=$(($NNODES * $PPN))

APP_DIR=$(pwd)
ACCLAIM_PATH=path/to/acclaim

cd $ACCLAIM_PATH
mkdir -p tuning_jsons

save_file_path=${ACCLAIM_PATH}/tuning_jsons/${PBS_JOBID}.json
make gen_config_single N=$NNODES PPN=$PPN MSG_SIZE=1048576 COLLECTIVE="allreduce" SAVE_FILE="${save_file_path}"

cd $APP_DIR
mpiexec \
    -n $PROCS \
    -ppn $PPN \
    -genv MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE=${save_file_path} \
    ...

Tuning MPI_Allreduce and MPI_Bcast on ANL Aurora:

NNODES=`wc -l < $PBS_NODEFILE`
PPN=12
PROCS=$(($NNODES * $PPN))

APP_DIR=$(pwd)
ACCLAIM_PATH=path/to/acclaim

cd $ACCLAIM_PATH
mkdir -p tuning_jsons

save_file_path=${ACCLAIM_PATH}/tuning_jsons/${PBS_JOBID}.json
make gen_config_multiple N=$NNODES PPN=$PPN MSG_SIZE=1048576 COLLECTIVE_LIST="allreduce,bcast" SAVE_FILE="${save_file_path}"

cd $APP_DIR
mpiexec \
    -n $PROCS \
    -ppn $PPN \
    -genv MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE=${save_file_path} \
    ...

Tuning all collectives on ANL Aurora:

NNODES=`wc -l < $PBS_NODEFILE`
PPN=12
PROCS=$(($NNODES * $PPN))

APP_DIR=$(pwd)
ACCLAIM_PATH=path/to/acclaim

cd $ACCLAIM_PATH
mkdir -p tuning_jsons

save_file_path=${ACCLAIM_PATH}/tuning_jsons/${PBS_JOBID}.json
make gen_config_all N=$NNODES PPN=$PPN MSG_SIZE=1048576 SAVE_FILE="${save_file_path}"

cd $APP_DIR
mpiexec \
    -n $PROCS \
    -ppn $PPN \
    -genv MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE=${save_file_path} \
    ...

CH4 Collective Tuning

Aurora uses the CH4 device in MPICH for its transport layer, which includes an additional layer of collective algorithms and tuning opportunities! For some situations, e.g., bandwidth-bound message sizes for MPIR_Allreduce, we have observed more significant performance improvements from CH4 tuning than MPIR tuning.

Tuning CH4 MPI_Allreduce on ANL Aurora:

NNODES=`wc -l < $PBS_NODEFILE`
PPN=12
PROCS=$(($NNODES * $PPN))

APP_DIR=$(pwd)
ACCLAIM_PATH=path/to/acclaim

cd $ACCLAIM_PATH
mkdir -p tuning_jsons

save_file_path=${ACCLAIM_PATH}/tuning_jsons/${PBS_JOBID}_ch4.json
make gen_config_single_ch4 N=$NNODES PPN=$PPN MSG_SIZE=1048576 COLLECTIVE="allreduce" SAVE_FILE="${save_file_path}"

cd $APP_DIR
mpiexec \
    -n $PROCS \
    -ppn $PPN \
    -genv MPIR_CVAR_CH4_COLL_SELECTION_TUNING_JSON_FILE=${save_file_path} \
    ...

The only modification from the standard tuning protocol is to use gen_config_single_ch4 instead of gen_config_single in the make command, and MPIR_CVAR_CH4_COLL_SELECTION_TUNING_JSON_FILE to pass the .json to MPICH.

CPU vs. GPU Collective Tuning

CPU and GPU-based tuning on Aurora are both supported out-of-the-box! The only change is using aurora_cpu or aurora_xpu for the system name when invoking setup.py. See the examples in the previous section.

Happy tuning!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!