Skip to content

How to compile

My Linh Würzburger edited this page Dec 17, 2020 · 2 revisions

Repository

This repository is structured into two folders, src and tests. In src all source code is placed, whereas in tests all test cases are stored subordinated into folders based on the nature of their solver.

Requirements

The serial CPU version of ARTSS can be compiled on Linux or MacOS systems with very few tools, whereas the multicore and GPU version needs an OpenACC capable compiler. Detailed requirements are listed in the table below (general requirements for serial version, specific for multicore and GPU version).

Purpose Tool Version
General Version control system (optional) git >= 2.0
Build processor using a compiler-independent method CMake >= 2.8
Compiler fully supporting C++-17 (gcc or clang) gcc >= 7.0
Visualisation of output vtk >= 5.8
Testing for consistency of output while developing Python >= 3.6
Specific Compiler fully supporting C++-17 and OpenACC PGI >= 19.10

Compiling the Code

Once the code has been checked out and all required software has been installed, ARTSS can be built from the terminal by first running cmake to configure the build, then running make. The steps are summarised below.

# 1. Clone
git clone https://github.com/FireDynamics/ARTSS.git
cd ARTSS

# if you already have a local copy of ARTSS and are missing spdlog do a recursive submodul init.
git submodule update --init --recursive

# 2. Make and enter a folder for compiling the code
mkdir build
cd build

# 3. Prepare environment (for use of CUDA Tools )
export CUDA_LIB=$CUDA_ROOT/lib64
export CUDA_INC=$CUDA_ROOT/include

# 4. Use CMake to configure the build
# By default ARTSS builds in release mode with optimisations and without warnings.
cmake ..

# 5. Build ARTSS ( parallelised with option -j <#cores>)
make

CMake options

By default, ARTSS is built in release mode, which should be used for installing, benchmarking and producing with ARTSS. To compile in debug mode with -g -O0 flags and warnings, use the CMAKE_BUILD_TYPE CMake parameter. Further, CMake uses the compiler which is set by the environment variables CC and CXX. Check with cc --version or c++ --version. To change these, use the CMake parameters CMAKE_C_COMPILER and CMAKE_CXX_COMPILER. These options are summarised below.

In 4. Use CMake parameters to configure the build
cmake \
         -DCMAKE_BUILD_TYPE={Release,Debug} \
         -DCMAKE_C_COMPILER={gcc,clang,pgcc} \
         -DCMAKE_CXX_COMPILER={g++,clang++,pgc++} \
         -DGPU_MODEL={K40,K80,P100} \
         -DCUDA_VERSION={8,...} \
..

Based on the GPU’s compute capability, the GPU target needs to be set as special flag, e.g., by -DGPU_MODEL={K40, K80, P100} resulting, for instance, in the target flag -ta=tesla:cc60 for NVIDIA’s P100 GPU, whereas P100 is set as default. Here, also the CUDA version can be set, e.g., by -DCUDA_VERSION=10.1, where 8.0 is set as default.

Executables

Since ARTSS is performance portable and applicable to various architectures, there exist several targets when building ARTSS (selected by make ), whereby each executable has a different purpose described in the table below.

Purpose and properties Architecture Executable/ Target
Production CPU - serial artss_serial
- with terminal/ data output, CPU - multicore artss_multicore_cpu
- visualisation and analysis GPU artss_gpu
Benchmarking CPU - serial artss_serial_benchmarking
- without output or visualisation CPU - multicore artss_multicore_cpu_benchmarking
- without analysis GPU artss_gpu_benchmarking

Using a Script to Compile

There also exists a compile.sh script to compile ARTSS (in the repository home folder). Thereby, only the repository needs to be cloned, and all other steps (including creation of the build folder, loading modules for a specified workstation and setting the compute capability or CUDA version) are executed automatically. See README.md. For more options, type ./compile --help.

Checking OpenACC compiler output

During the compilation of GPU targets, the flags -Minfo=accel as well as -ta=<target>,lineinfo set in CMakeLists.txt display all acceleration information such as data regions or kernel generation with loop schedules and show the corresponding lines of the source files as below.

338 , Generating present ( d_out [: bsize ], d_in [: bsize ],
         d_iList [: bsize_i ],d_b [: bsize ])
           Accelerator kernel generated
           Generating Tesla code
        341 , # pragma acc loop gang , vector (128) /*blockIdx .x threadIdx .x */

Here, it is important to check the information in cases of new parallelisations or optimisations with OpenACC. Using the PGI OpenACC compiler, results such as Complex loop carried dependence of ... -> prevents parallelisation or Loop carried backward dependence of ... -> prevents vectorisation indicate false usage of the kernel or parallel loop pragmas, whereas upper bound for dimension 0 of array ’...’ is unknown shows missing pointer size information in a data pragma.

After compilation during running a simulation, there can still occur errors such as FATAL ERROR: variable in data clause is partially present on device. This indicates that a pointer used by the GPU is not present and was not send to the GPU via enter data. In order to gain more detailed insights into the data movements or accelerator kernel launches, profiling tools can be utilised or additional verbose output while running the executable can be requested (by the PGI compiler) by setting the environment variable PGI_ACC_NOTIFY=3 before executing a program. PGI_ACC_NOTIFY=1 will only print kernel launches, and PGI_ACC_NOTIFY=2 will only print upload and download lines.


Back to Wiki home

Clone this wiki locally