How to compile

Repository

This repository is structured into two folders, src and tests. In src all source code is placed, whereas in tests all test cases are stored subordinated into folders based on the nature of their solver.

Requirements

The serial CPU version of ARTSS can be compiled on Linux or MacOS systems with very few tools, whereas the multicore and GPU version needs an OpenACC capable compiler. Detailed requirements are listed in the table below (general requirements for serial version, specific for multicore and GPU version).

	Purpose	Tool	Version
General	Version control system (optional)	git	>= 2.0
	Build processor using a compiler-independent method	CMake	>= 2.8
	Compiler fully supporting C++-17 (gcc or clang)	gcc	>= 7.0
	Visualisation of output	vtk	>= 5.8
	Testing for consistency of output while developing	Python	>= 3.6
Specific	Compiler fully supporting C++-17 and OpenACC	PGI	>= 19.10

Compiling the Code

Once the code has been checked out and all required software has been installed, ARTSS can be built from the terminal by first running cmake to configure the build, then running make. The steps are summarised below.

# 1. Clone
git clone https://github.com/FireDynamics/ARTSS.git
cd ARTSS

# if you already have a local copy of ARTSS and are missing spdlog do a recursive submodul init.
git submodule update --init --recursive

# 2. Make and enter a folder for compiling the code
mkdir build
cd build

# 3. Prepare environment (for use of CUDA Tools )
export CUDA_LIB=$CUDA_ROOT/lib64
export CUDA_INC=$CUDA_ROOT/include

# 4. Use CMake to configure the build
# By default ARTSS builds in release mode with optimisations and without warnings.
cmake ..

# 5. Build ARTSS ( parallelised with option -j <#cores>)
make

CMake options

By default, ARTSS is built in release mode, which should be used for installing, benchmarking and producing with ARTSS. To compile in debug mode with -g -O0 flags and warnings, use the CMAKE_BUILD_TYPE CMake parameter. Further, CMake uses the compiler which is set by the environment variables CC and CXX. Check with cc --version or c++ --version. To change these, use the CMake parameters CMAKE_C_COMPILER and CMAKE_CXX_COMPILER. These options are summarised below.

In 4. Use CMake parameters to configure the build
cmake \
         -DCMAKE_BUILD_TYPE={Release,Debug} \
         -DCMAKE_C_COMPILER={gcc,clang,pgcc} \
         -DCMAKE_CXX_COMPILER={g++,clang++,pgc++} \
         -DGPU_MODEL={K40,K80,P100} \
         -DCUDA_VERSION={8,...} \
..

Based on the GPU’s compute capability, the GPU target needs to be set as special flag, e.g., by -DGPU_MODEL={K40, K80, P100} resulting, for instance, in the target flag -ta=tesla:cc60 for NVIDIA’s P100 GPU, whereas P100 is set as default. Here, also the CUDA version can be set, e.g., by -DCUDA_VERSION=10.1, where 8.0 is set as default.

Executables

Since ARTSS is performance portable and applicable to various architectures, there exist several targets when building ARTSS (selected by make ), whereby each executable has a different purpose described in the table below.

Purpose and properties	Architecture	Executable/ Target
Production	CPU - serial	`artss_serial`
- with terminal/ data output,	CPU - multicore	`artss_multicore_cpu`
- visualisation and analysis	GPU	`artss_gpu`
Benchmarking	CPU - serial	`artss_serial_benchmarking`
- without output or visualisation	CPU - multicore	`artss_multicore_cpu_benchmarking`
- without analysis	GPU	`artss_gpu_benchmarking`

Using a Script to Compile

There also exists a compile.sh script to compile ARTSS (in the repository home folder). Thereby, only the repository needs to be cloned, and all other steps (including creation of the build folder, loading modules for a specified workstation and setting the compute capability or CUDA version) are executed automatically. See README.md. For more options, type ./compile --help.

Checking OpenACC compiler output

During the compilation of GPU targets, the flags -Minfo=accel as well as -ta=<target>,lineinfo set in CMakeLists.txt display all acceleration information such as data regions or kernel generation with loop schedules and show the corresponding lines of the source files as below.

338 , Generating present ( d_out [: bsize ], d_in [: bsize ],
         d_iList [: bsize_i ],d_b [: bsize ])
           Accelerator kernel generated
           Generating Tesla code
        341 , # pragma acc loop gang , vector (128) /*blockIdx .x threadIdx .x */

Here, it is important to check the information in cases of new parallelisations or optimisations with OpenACC. Using the PGI OpenACC compiler, results such as Complex loop carried dependence of ... -> prevents parallelisation or Loop carried backward dependence of ... -> prevents vectorisation indicate false usage of the kernel or parallel loop pragmas, whereas upper bound for dimension 0 of array ’...’ is unknown shows missing pointer size information in a data pragma.

After compilation during running a simulation, there can still occur errors such as FATAL ERROR: variable in data clause is partially present on device. This indicates that a pointer used by the GPU is not present and was not send to the GPU via enter data. In order to gain more detailed insights into the data movements or accelerator kernel launches, profiling tools can be utilised or additional verbose output while running the executable can be requested (by the PGI compiler) by setting the environment variable PGI_ACC_NOTIFY=3 before executing a program. PGI_ACC_NOTIFY=1 will only print kernel launches, and PGI_ACC_NOTIFY=2 will only print upload and download lines.

Back to Wiki home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to compile

Repository

Requirements

Compiling the Code

CMake options

Executables

Using a Script to Compile

Checking OpenACC compiler output

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally