CaPI is a selective code instrumentation tool, designed for streamlining the performance analysis workflow of large-scale parallel applications.
CaPI selects functions for instrumentation according to the needs of the analyst, based on a static call graph of the target application,.
This creates instrumentation configurations (ICs) that capture relevant parts of the code, while keeping the runtime overhead low.

It consists of two major components:
- A selection tool, that creates ICs tailored to the target code and measurement objective.
- A runtime library, enabling runtime-adaptable binary instrumentation based on LLVM XRay.
CaPI currently supports the following measurement APIs:
- GNU interface: compatible with GCC's
-finstrument-functions - TALP (part of the DLB library): Parallel performance metrics of MPI regions
- Score-P: Instrumentation-based profiling and tracing
- Extrae: MPI-based tracing
This project is currently in a pre-release state, frequent changes to the code and build config can be expected.
- CMake >=3.15
- LLVM >=10 (>=20 for XRay shared library instrumentation)
- MetaCG
- ScoreP 7.x (optional)
- DLB 3.5 (optional, other versions may work)
- Extrae 3.8.3 (optional, other versions may work)
- LLVM-Lit (for testing only)
CaPI is built as follows (Ninja is not required and can be substituted with make).
mkdir build && cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$(which clang) -DCMAKE_CXX_COMPILER=$(which clang++) -DDLB_DIR=$(which dlb)/../.. -DSCOREP_DIR=$(which scorep)/../.. ..
ninja
CMake Options
ENABLE_TALP=ON/OFF: Enable/Disable support for TALP. Default isON.DLB_DIR: Path to DLB installation.
ENABLE_SCOREP=ON/OFF: Enable/Disable support for Score-P. Default isON.SCOREP_DIR: Path to Score-P installation.
ENABLE_EXTRAE=ON/OFF: Enable/Disable support for Extrae. Default isON.ENABLE_XRAY=ON/OFF: Enable/Disable support for runtime-adaptable instrumentation with LLVM XRay. Default isON.ENABLE_TESTING=ON/OFF: Enable/Disable testing. Requires MetaCG and LLVM-Lit.- Set
metacg_DIRto MetaCG installation directory.
- Set
To verify your build, you may test out the instrumentation of proxy applications LULESH and AMG in the example folder (located in your current build directory).
For details, refer to CAPI_README in the lulesh folder.
CaPI relies on MetaCG for its whole-program call graph analysis.
In order to apply CaPI, you first need to install MetaCG and run it on your target application. See the MetaCG README for instructions.
You can then run CaPI to generate the instrumentation configuration (IC).
The IC is determined by passing the functions in the CG through a composable pipeline of individual selectors. Each of these selectors produces an output set that is in turn consumed by other selectors.
This is an overview of the current command line interface.
-hPrint a list of options.-i <query>Parse the selection query from the given string.-f <file>Use a selection query file.-o <file>The output IC file.-v <verbosity>Set verbosity level (0-3, default is 2). Passing-vwithout argument sets it to 3.--write-dot <file>Write a dotfile of the selected call-graph subset.--replace-inlined <binary>Replaces inlined functions with parents. Requires passing the executable.--output-format <output_format>Set the file format. Options arescorep,json(default) andsimple--debugEnable debugging mode.--print-scc-statsPrints information about the strongly connected components (SCCs) of this call graph.--traverse-virtual-dtorsEnable traversal of virtual destructors (may lead to over-approximation of destructor inheritance).
CaPI defines a DSL for the user-defined instrumentation selection.
The selection query is passed in as string with the -i flag:
capi -i '<selection_query>' callgraph.ipcg
Alternatively, -f <file> instructs CaPI to load the query from the given file.
The query consists of a pipeline of selector instances, which can be named or anonymous. Selectors types are pre-defined but can be customized via parameters. Valid parameter types are strings (enclosed in double quotes), booleans (true/false), integers and floating point numbers.
Selectors can be combined using the pipe operator |>.
Most of the available selectors types take at least one pipeline definition as input.
These can be either in-place definitions or references to other named pipeline definitions, prefixed with %.
For example, the following selector pipeline, named mpi, uses the by_name selector to find all functions starting with MPI_.
mpi = %% |> by_name("MPI_.*")
The pipeline % is pre-defined and refers to an instance of the EverythingSelector, which selects every function in the call graph.
If no input is explicitly given, %% is added implicitly.
The previous example can, thus, be simplified as follows:
mpi = by_name("MPI_.*")
To extend this example, we can look at functions that are on a call path to MPI communication:
mpi = by_name("MPI_.*")
mpi_callpath = %mpi |> on_call_path_to
Another way to reduce overhead is to exclude functions that are marked as inline.
To achieve this, we need the inline_specified selector and combine the results using the subtract selector.
Note that subtract takes two input pipelines, which are specified as a tuple enclosed in square brackets [A, B].
Adding this to the previous query, we get the following query:
mpi = by_name("MPI_.*")
mpi_callpath = %mpi |> on_call_path_to
final = [%mpi_callpath, inline_specified] |> subtract
To simplify the use of set operations like subtract, they can also be expressed as binary operators:
| Set Operation | Selector | Equivalent Operator |
|---|---|---|
| union | join |
| |
| intersection | intersect |
& |
| difference | subtract |
- |
Using the operator notation the query can be rewritten as
mpi = by_name("MPI_.*")
mpi_callpath = %mpi |> on_call_path_to
final = %mpi_callpath - inline_specified
or in a single line:
final = (by_name("MPI_.*") |> on_call_path_to) - inline_specified
Directives start with ! and are used to control the parsing and selection process.
CaPI currently supports two types of directives: !import and !instrument.
The import directive is used for loading existing selection modules.
This allows to build and re-use selection pipelines that are useful across multiple applications.
For example, the mpi_callpath selector from the previous example could be moved to a separate file mpi.capi:
!import("mpi.capi")
final = %mpi_callpath - inline_specified
The instrument directives gives explicit control over the created instrumentation configuration.
It allows the user to specify custom instrumentation levels and associated invocation ranges that are reflected in the
created instrumentation configuration file.
# Instrument result of "A" with level "basic"
!instrument(%A, "basic")
# Instrument result of "B" to record invocations 1-10 and 100 in "detail" level, invocations 11-99 in "basic" level.
!instrument(%B, "detail:1-10,100", "basic:11-99")
Note that only the custom json output format supports these features.
For other formats, only information about the set of instrumented functions is recorded.
If no instrument directive is specified, the result of the last pipeline definition is used.
| Name | Parameters | Selector inputs | Example | Explanation |
|---|---|---|---|---|
| by_name | regex string | 1 | by_name("foo.*") |
Selects functions with names starting with "foo". |
| by_path | regex string | 1 | byPath("foo/.*") |
Selects functions contained in directory "foo". |
| inline_specified | - | 1 | inline_specified |
Selects functions marked as inline. |
| on_call_path_to | - | 1 | by_name("foo") |> on_call_path_to |
Selects functions in the call chain to function "foo". |
| on_call_path_from | - | 1 | by_name("foo") |> on_call_path_from |
Selects functions in the call chain from function "foo". |
| in_system_header | - | 1 | in_system_header |
Selects functions defined in system headers. |
| contains_unresolved_calls | - | 1 | contains_unresolved_calls |
Selects functions containing calls to unknown target functions. |
| join | - | 2 | [%A, %B] |> join or %A | %B |
Union of the two input sets. |
| intersect | - | 2 | [%A, %B] |> intersect or %A & %B |
Intersection of the two input sets. |
| subtract | - | 2 | [%A, %B] |> subtract or %A - %B |
Difference of the two input sets. |
| coarse | - | 1 or 2 | [%A, %B] |> coarse |
Filter out functions that have a single caller and callee, unless they are included in B. |
| min_call_depth | comp. operator, threshold | 1 | %A |> min_call_depth("<=", 3) |
Selects functions that are at most 3 calls away from a root node. |
| flops/memops | comp. operator, threshold | 1 | %A |> flops(">=", 10) |
Selects functions with at least 10 floating point operations. |
| loop_depth | comp. operator, threshold | 1 | %A |> loop_depth("=", 2) |
Selects functions containing loop nests of depth 2. |
| inclusive_statement_count | comp. operator, threshold | 1 | %A |> inclusive_statement_count(">", 100) |
Selects functions with an inclusive statement count (statements in reachable sub-graph) > 100. |
| common_caller common_caller_distinct common_caller_partial |
heuristic parameter | 2 | [by_name("foo"), by_name("bar")] |> common_caller(1) |
Common caller selection with max. LCA-Dist 1 (details here) |
The common_caller selectors are specialized heuristics for augmenting MPI based traces [3].
To instrument a region in the trace, the surrounding MPI calls X and Y are determined.
Passing the name of the direct callers of X and Y to the common_caller query, CaPI selects relevant calls path leading to these calls.
Details will be made available in an upcoming publication.
If CaPI is built with TALP support, the following selectors, based on TALP efficiency metrics attached to the call graph as function metadata, are available.
| Name | Parameters | Selector inputs | Example | Explanation |
|---|---|---|---|---|
| has_talp_metrics | - | 1 | has_talp_metrics |
Selects the subset of functions that has TALP metrics attached. |
| talp_cycles | 1 | 1 | talp_cycles(">", 1000) |
Selection based on number of elapsed cycles. |
| talp_instructions | 1 | 1 | talp_instructions(">", 500000) |
Selection based on number of executed instructions. |
| talp_measurements | 1 | 1 | talp_measurements(">", 5) |
Selection based on number of performance measurements. |
| talp_elapsed_time | 1 | 1 | talp_elapsed_time("<", 2.0e6) |
Selection based on total elapsed time in nanoseconds. |
| talp_mpi_calls | 1 | 1 | talp_mpi_calls(">=", 10) |
Selection based on number of MPI calls. |
| talp_parallel_efficiency | 1 | 1 | talp_parallel_efficiency("<", 0.8) |
Selection based on overall parallel efficiency (ratio between 0 and 1). |
| talp_mpi_parallel_efficiency | 1 | 1 | talp_mpi_parallel_efficiency("<", 0.9) |
Selection based on MPI parallel efficiency (ratio between 0 and 1). |
| talp_mpi_comm_efficiency | 1 | 1 | talp_mpi_comm_efficiency("<", 0.85) |
Selection based on MPI communication efficiency. |
| talp_mpi_load_balance | 1 | 1 | talp_mpi_load_balance("<", 0.95) |
Selection based on MPI load balance efficiency. |
| talp_mpi_load_balance_in | 1 | 1 | talp_mpi_load_balance_in("<", 0.9) |
Selection based on MPI intra-node load balance. |
| talp_mpi_load_balance_out | 1 | 1 | talp_mpi_load_balance_out("<", 0.9) |
Selection based on MPI inter-node load balance. |
| talp_dyn_filtered | - | 1 | talp_dyn_filtered |
Selects functions filtered dynamically during TALP run. |
LLVM-XRay currently does not support the instrumentation of inlined functions.
Since the MetaCG call graph is based on the source code, the information whether a function is inlined by the compiler is not directly available to CaPI.
As a result, the IC may contain inlined functions that cannot be instrumented.
The --replace-inlined <executable_binary> option was added to compensate this issue.
It detects which functions in the IC are not available in the binary and replaces them with direct callers.
The IC generated by CaPI is used to direct the instrumentation of the target application. Static and dynamic instrumentation methods are supported. However, due to their flexibility the dynamic instrumentation workflow using LLVM-XRay is prefered, since it allows for rapid iterative adjustments of the selection, without requiring the program to be rebuilt.
You can use the provided compiler wrappers clang-inst/clang-inst++ to build and instrument program.
Before building, set the environment variable CAPI_FILTER_FILE to the name of the generated IC file.
Please note that the compiler wrapper will not automatically link any measurement library.
You will need to pass the corresponding build flags yourself.
Passing --output-format scorep to CaPI generates a filter file compatible with Score-P.
This enables directly instrumenting with the Score-P instrumenter.
To do this, simply build with scorep-g++ and set SCOREP_WRAPPER_INSTRUMENTER_FLAGS="--instrument-filter=<filter-file>".
To enable measuring functions in shared libraries, use the Score-P Symbol Injector library (Note: as of Score-P 8, this is no longer necessary).
CaPI now provides a runtime library compatible with LLVM XRay. Instead of using a statically instrumented build for each IC, this enables dynamic instrumentation during program initialization. With XRay, only one build is required and ICs can be changed without recompilation. Note: This requires LLVM version 20 or newer.
You can toggle this feature by setting ENABLE_XRAY=ON on.
This will generate the compiler wrapper capicc in the scripts subdirectory of your current build.
This wrapper automatically adds the required flags to instrument your code and link in the necessary dependencies.
A corresponding wrapper is generated in the install tree as well.
To use it, simply prepend your existing compiler invocation with this wrapper.
For example, Makefile-based projects can be compiled with make CC='capicc clang' CXX='capicc clang++'.
There are currently five different tool interfaces implemented in the following CaPI runtime libraries:
libcapixray_gnu.a: Compatible with-finstrument-functions. Calls__cyg_profile_func_enteron enter and__cyg_profile_func_exiton exit.libcapixray_scorep.a: Compatible with the GNU interface of Score-P.libcapixray_talp.a: Interface for the TALP tool.libcapixray_extrae.a: Interface for the Extrae tool.libcapixray_nesmik.a: Interface for NeSmiK.
The tool interface is selected in the wrapper by passing --capi-interface=<gnu/scorep/talp/extrae/nesmik>.
To instrument the program at startup, set the environment variable CAPI_FILTERING_FILE=<ic_file>.
As an alternative to the wrappers, it is also possible to pass the required flags manually.
When building the target application, you will need to use the Clang compiler and pass the flag -fxray-instrument.
XRay uses a pre-filtering mechanism to exclude very small functions. If you want to be able to potentially instrument all functions, you need to pass -fxray-instruction-threshold=1 as well.
You will then need to link the XRay-compatible CaPI runtime library into your executable by adding the following:
-Wl,--whole-archive <capi_build_dir>/lib/xray/libcapixray_<capi_interface>.a -Wl,--no-whole-archive, along with the required LLVM dependencies given by llvm-config --libfiles xray symbolize --link-static --system-libs.
[1] Kreutzer, S., Iwainsky, C., Lehr, JP., Bischof, C. (2022). Compiler-Assisted Instrumentation Selection for Large-Scale C++ Codes. In: Anzt, H., Bienz, A., Luszczek, P., Baboulin, M. (eds) High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham. https://doi.org/10.1007/978-3-031-23220-6_1
[2] S. Kreutzer, C. Iwainsky, M. Garcia-Gasulla, V. Lopez and C. Bischof, "Runtime-Adaptable Selective Performance Instrumentation," 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, FL, USA, 2023, pp. 423-432, doi: 10.1109/IPDPSW59300.2023.00073.
[3] Kreutzer, S., Serra, J.P., Iwainsky, C., Gasulla, M.G., Bischof, C. (2025). Augmentation of MPI Traces Using Selective Instrumentation. In: Weiland, M., Neuwirth, S., Kruse, C., Weinzierl, T. (eds) High Performance Computing. ISC High Performance 2024 International Workshops. ISC High Performance 2023. Lecture Notes in Computer Science, vol 15058. Springer, Cham. https://doi.org/10.1007/978-3-031-73716-9_3
