-
Notifications
You must be signed in to change notification settings - Fork 1
Aurora Instructions
Argonne's Aurora supercomputer is the main target platform for ACCLAiM. If you run into any issues while following these instructions, please let us know!
If you have not already, please first consult the repository's README to familiarize yourself with this software.
ACCLAiM has been tested on Aurora with recent Python3 versions.
We recommend loading Python using the module system, such as:
module load python
To install ACCLAiM's Python dependencies on Aurora, we recommend setting up a local virtual environment.
For example, you can go to ACCLAiM's root directory and run python3 -m venv .venv to setup a virtual environment using pip.
To use the virtual environment, activate it (source .venv/bin/activate) and install the required packages listed in the README (pip install -r requirements.txt).
MPICH is the default MPI implementation on Aurora, so you do not have to build/install it separately.
If you wish to do so anyways, please follow the instructions found here.
The default environment on Aurora, the path to the default MPICH implementation is stored in MPICH_ROOT.
You can also find the path of the default installation by querying the system, e.g., which mpicc.
Aurora uses the mpiexec provided by Cray's PALS package. Its path can be found by querying the system, e.g., which mpiexec.
Aurora's setup script works out-of-the-box for CPU and GPU tuning on Aurora.
Ensure that the dependencies are loaded in your environment, then run setup.py:
python3 setup.py $MPICH_ROOT aurora_xpu --launcher_path $(which mpiexec)
python3 setup.py $MPICH_ROOT aurora_cpu --launcher_path $(which mpiexec)
Note that the path to mpiexec must be provided separately because does not use the mpiexec provided by MPICH.
Once setup is complete, examples for how to use ACCLAiM on Aurora are provided in the main README and repeated here:
Tuning MPI_Allreduce on ANL Aurora:
NNODES=`wc -l < $PBS_NODEFILE`
PPN=12
PROCS=$(($NNODES * $PPN))
APP_DIR=$(pwd)
ACCLAIM_PATH=path/to/acclaim
cd $ACCLAIM_PATH
mkdir -p tuning_jsons
save_file_path=${ACCLAIM_PATH}/tuning_jsons/${PBS_JOBID}.json
make gen_config_single N=$NNODES PPN=$PPN MSG_SIZE=1048576 COLLECTIVE="allreduce" SAVE_FILE="${save_file_path}"
cd $APP_DIR
mpiexec \
-n $PROCS \
-ppn $PPN \
-genv MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE=${save_file_path} \
...
Tuning MPI_Allreduce and MPI_Bcast on ANL Aurora:
NNODES=`wc -l < $PBS_NODEFILE`
PPN=12
PROCS=$(($NNODES * $PPN))
APP_DIR=$(pwd)
ACCLAIM_PATH=path/to/acclaim
cd $ACCLAIM_PATH
mkdir -p tuning_jsons
save_file_path=${ACCLAIM_PATH}/tuning_jsons/${PBS_JOBID}.json
make gen_config_multiple N=$NNODES PPN=$PPN MSG_SIZE=1048576 COLLECTIVE_LIST="allreduce,bcast" SAVE_FILE="${save_file_path}"
cd $APP_DIR
mpiexec \
-n $PROCS \
-ppn $PPN \
-genv MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE=${save_file_path} \
...
Tuning all collectives on ANL Aurora:
NNODES=`wc -l < $PBS_NODEFILE`
PPN=12
PROCS=$(($NNODES * $PPN))
APP_DIR=$(pwd)
ACCLAIM_PATH=path/to/acclaim
cd $ACCLAIM_PATH
mkdir -p tuning_jsons
save_file_path=${ACCLAIM_PATH}/tuning_jsons/${PBS_JOBID}.json
make gen_config_all N=$NNODES PPN=$PPN MSG_SIZE=1048576 SAVE_FILE="${save_file_path}"
cd $APP_DIR
mpiexec \
-n $PROCS \
-ppn $PPN \
-genv MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE=${save_file_path} \
...
Aurora uses the CH4 device in MPICH for its transport layer, which includes an additional layer of collective algorithms and tuning opportunities!
For some situations, e.g., bandwidth-bound message sizes for MPIR_Allreduce, we have observed more significant performance improvements from CH4 tuning than MPIR tuning.
Tuning CH4 MPI_Allreduce on ANL Aurora:
NNODES=`wc -l < $PBS_NODEFILE`
PPN=12
PROCS=$(($NNODES * $PPN))
APP_DIR=$(pwd)
ACCLAIM_PATH=path/to/acclaim
cd $ACCLAIM_PATH
mkdir -p tuning_jsons
save_file_path=${ACCLAIM_PATH}/tuning_jsons/${PBS_JOBID}_ch4.json
make gen_config_single_ch4 N=$NNODES PPN=$PPN MSG_SIZE=1048576 COLLECTIVE="allreduce" SAVE_FILE="${save_file_path}"
cd $APP_DIR
mpiexec \
-n $PROCS \
-ppn $PPN \
-genv MPIR_CVAR_CH4_COLL_SELECTION_TUNING_JSON_FILE=${save_file_path} \
...
The only modification from the standard tuning protocol is to use gen_config_single_ch4 instead of gen_config_single in the make command, and MPIR_CVAR_CH4_COLL_SELECTION_TUNING_JSON_FILE to pass the .json to MPICH.
CPU and GPU-based tuning on Aurora are both supported out-of-the-box!
The only change is using aurora_cpu or aurora_xpu for the system name when invoking setup.py.
See the examples in the previous section.
Happy tuning!