-
Notifications
You must be signed in to change notification settings - Fork 2
How to compile
This repository is structured into two folders, src and tests. In src all source code is placed, whereas in tests all test cases are stored subordinated into folders based on the nature of their solver.
The serial CPU version of ARTSS can be compiled on Linux or MacOS systems with very few tools, whereas the multicore and GPU version needs an OpenACC capable compiler. Detailed requirements are listed in the table below (general requirements for serial version, specific for multicore and GPU version).
| Purpose | Tool | Version | |
|---|---|---|---|
| General | Version control system (optional) | git | >= 2.0 |
| Build processor using a compiler-independent method | CMake | >= 2.8 | |
| Compiler fully supporting C++-17 (gcc or clang) | gcc | >= 7.0 | |
| Visualisation of output | vtk | >= 5.8 | |
| Testing for consistency of output while developing | Python | >= 3.6 | |
| Specific | Compiler fully supporting C++-17 and OpenACC | PGI | >= 19.10 |
Once the code has been checked out and all required software has been installed, ARTSS can be built from the terminal by first running cmake to configure the build, then running make. The steps are summarised below.
# 1. Clone
git clone https://github.com/FireDynamics/ARTSS.git
cd ARTSS
# if you already have a local copy of ARTSS and are missing spdlog do a recursive submodul init.
git submodule update --init --recursive
# 2. Make and enter a folder for compiling the code
mkdir build
cd build
# 3. Prepare environment (for use of CUDA Tools )
export CUDA_LIB=$CUDA_ROOT/lib64
export CUDA_INC=$CUDA_ROOT/include
# 4. Use CMake to configure the build
# By default ARTSS builds in release mode with optimisations and without warnings.
cmake ..
# 5. Build ARTSS ( parallelised with option -j <#cores>)
make
By default, ARTSS is built in release mode, which should be used for installing,
benchmarking and producing with ARTSS. To compile in debug mode with -g -O0
flags and warnings, use the CMAKE_BUILD_TYPE CMake parameter. Further, CMake
uses the compiler which is set by the environment variables CC and CXX. Check
with cc --version or c++ --version. To change these, use the CMake parameters
CMAKE_C_COMPILER and CMAKE_CXX_COMPILER. These options are summarised below.
In 4. Use CMake parameters to configure the build
cmake \
-DCMAKE_BUILD_TYPE={Release,Debug} \
-DCMAKE_C_COMPILER={gcc,clang,pgcc} \
-DCMAKE_CXX_COMPILER={g++,clang++,pgc++} \
-DGPU_MODEL={K40,K80,P100} \
-DCUDA_VERSION={8,...} \
..
Based on the GPU’s compute capability, the GPU target needs to be set as special
flag, e.g., by -DGPU_MODEL={K40, K80, P100} resulting, for instance, in the target flag -ta=tesla:cc60 for NVIDIA’s P100 GPU, whereas P100 is set as default. Here, also the CUDA version can be set, e.g., by -DCUDA_VERSION=10.1, where 8.0 is set as default.
Since ARTSS is performance portable and applicable to various architectures, there exist several targets when building ARTSS (selected by make ), whereby each executable has a different purpose described in the table below.
| Purpose and properties | Architecture | Executable/ Target |
|---|---|---|
| Production | CPU - serial | artss_serial |
| - with terminal/ data output, | CPU - multicore | artss_multicore_cpu |
| - visualisation and analysis | GPU | artss_gpu |
| Benchmarking | CPU - serial | artss_serial_benchmarking |
| - without output or visualisation | CPU - multicore | artss_multicore_cpu_benchmarking |
| - without analysis | GPU | artss_gpu_benchmarking |
There also exists a compile.sh script to compile ARTSS (in the repository home folder). Thereby, only the repository needs to be cloned, and all other steps (including creation of the
build folder, loading modules for a specified workstation and setting the compute capability or CUDA version) are executed automatically. See README.md. For more options, type ./compile --help.
During the compilation of GPU targets, the flags -Minfo=accel as well as
-ta=<target>,lineinfo set in CMakeLists.txt display all acceleration information
such as data regions or kernel generation with loop schedules and show the corresponding
lines of the source files as below.
338 , Generating present ( d_out [: bsize ], d_in [: bsize ],
d_iList [: bsize_i ],d_b [: bsize ])
Accelerator kernel generated
Generating Tesla code
341 , # pragma acc loop gang , vector (128) /*blockIdx .x threadIdx .x */
Here, it is important
to check the information in cases of new parallelisations or optimisations with
OpenACC. Using the PGI OpenACC compiler, results such as
Complex loop carried dependence of ... -> prevents parallelisation or
Loop carried backward dependence of ... -> prevents vectorisation
indicate false usage of the kernel or parallel loop pragmas, whereas
upper bound for dimension 0 of array ’...’ is unknown
shows missing pointer size information in a data pragma.
After compilation during running a simulation, there can still occur errors such as
FATAL ERROR: variable in data clause is partially present on device.
This indicates that a pointer used by the GPU is not present and was not send
to the GPU via enter data. In order to gain more detailed insights into the data
movements or accelerator kernel launches, profiling tools can be utilised or additional
verbose output while running the executable can be requested (by the PGI compiler)
by setting the environment variable PGI_ACC_NOTIFY=3 before executing a program.
PGI_ACC_NOTIFY=1 will only print kernel launches, and PGI_ACC_NOTIFY=2 will only print upload and download lines.
Back to Wiki home