perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

Quick Start | How to Build | Documentation | System Requirements

perf-cpp embeds Linux's hardware performance monitoring directly into your code, letting you profile exactly what matters and process the results in your application. Tools like Linux Perf, Intel® VTune™, and AMD uProf are powerful but monitor entire programs – and high-performance applications need surgical precision.

What can perf-cpp do?

Built around Linux's powerful perf subsystem, perf-cpp provides a clean interface for counting and sampling hardware events – without the complexity of low-level APIs.

Measure exactly what you want – utilize performance counters to count hardware events, similar to perf stat, but around specific code paths, not an entire binary (documentation).
Calculate metrics such as cycles per instruction and cache miss to access ratio based on hardware events and timing (documentation).
Low-latency performance counters access without starting/stopping the counters, for micro-benchmarks or adaptive tuning (documentation).
Record instruction and memory samples, just like perf [mem] record – but from inside your application (documentation).
Correlate samples with data structures and symbols to generate per-class access statistics and flame graphs.
Mix built-in events (e.g., cycles, instructions, cache misses, ...) with processor-specific counters (documentation).

See various practical examples and the documentation for more details.

Quick Start

Record Hardware Event Statistics

Recording hardware event statistics operates much like perf stat: it quantifies critical events–such as executed instructions, CPU cycles, and cache misses–throughout a code segment's execution.

#include <perfcpp/event_counter.h>

/// Initialize the counter
const auto counter_definition = perf::CounterDefinition{};
auto event_counter = perf::EventCounter{ counter_definition };

/// Specify hardware events to count
event_counter.add({"seconds", "instructions", "cycles", "cache-misses"});

/// Run the workload
event_counter.start();
code_to_profile(); /// <-- Statistics recorded while execution
event_counter.stop();

/// Print the result to the console
const auto result = event_counter.result();
for (const auto [event_name, value] : result)
{
    std::cout << event_name << ": " << value << std::endl;
}

Possible output:

seconds:      0.0955897 
instructions: 5.92087e+07
cycles:       4.70254e+08
cache-misses: 1.35633e+07

Note

For additional insights please refer to the guides on recording event statistics and event statistics on multiple CPUs/threads. Also, check out the hardware events documentation for details on both built-in and processor-specific events.

Record Samples

Recording samples functions much like perf [mem] record: it captures execution snapshots, e.g., the instruction pointer, executing CPU, and timestamp, at regular intervals (here every 4,000th CPU cycle).

#include <perfcpp/sampler.h>

/// Create the sampler
const auto counter_definition = perf::CounterDefinition{};
auto sampler = perf::Sampler{ counter_definition };

/// Specify when a sample is recorded: every 50,000th cycle
sampler.trigger("cycles", perf::Period{50000U});

/// Specify what data is included into a sample: time, CPU ID, instruction
sampler.values()
    .timestamp(true)
    .cpu_id(true)
    .instruction_pointer(true);

/// Run the workload
sampler.start();
code_to_profile(); /// <-- Samples recorded while execution
sampler.stop();

/// Print the samples to the console
const auto samples = sampler.result();
for (const auto& record : samples)
{
    const auto timestamp = record.metadata().timestamp().value();
    const auto cpu_id = record.metadata().cpu_id().value();
    const auto instruction = record.instruction_execution().logical_instruction_pointer().value();
    
    std::cout 
        << "Time = " << timestamp << " | CPU = " << cpu_id
        << " | Instruction = 0x" << std::hex << instruction << std::dec
        << std::endl;
}

Possible output:

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c
Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c

Note

For additional details–such as the types of data that can be included in samples–please consult the sampling guide. Additionally, consult the sampling on multiple CPUs/threads guide for instructions on parallel sampling.

More Examples

We include a collection of examples demonstrating the functionality and interface of perf-cpp in the examples/ directory, including

examples for counting hardware events (examples/statistics)
and for sampling (examples/sampling).

Building

perf-cpp is designed as a library (static or shared) that can be linked to your application.

# Clone the repository
git clone https://github.com/jmuehlig/perf-cpp.git

# Switch to the repository folder
cd perf-cpp

# Optional: Switch to this development version
git checkout v0.12.0

# Build the library (in build/)
# -DBUILD_EXAMPLES=1        compiles all examples (optional)
# -DBUILD_LIB_SHARED=1      creates the library as a shared one (optional)
# -DGEN_PROCESSOR_EVENTS=1  generates and compiles a .cpp file that adds events specific to the underlying CPU (optional)
cmake . -B build -DBUILD_EXAMPLES=1
cmake --build build

# Optional: Build examples (in build/examples/bin) if -DBUILD_EXAMPLES=1
cmake --build build --target examples

Note

Further information and detailed building instructions (e.g., how to integrate into CMake projects) are available in the building guide.

Full Documentation

Building: Integrate perf-cpp seamlessly into your C++ projects.
Counting Performance Events
- Basics: Master recording hardware event statistics directly within your application.
- Parallel and Multithreaded: Learn how to monitor events across threads and CPU cores.
- Metrics: Learn how to combine hardware events into meaningful metrics for clearer performance insights.
- Live Access: See how events can be accessed without stopping the recording, ideal for profiling tight loops and small functions.
Recording Samples
- Basics: Understand sampling mechanisms, which data to record, and how to access the results.
- Parallel and Multithreaded: Learn how to record samples in multithreaded workloads.
- Translating Instruction Pointers into Symbols and Samples into flame graphs: See how to translate instruction pointers into function names and prepare sampling results to transform them into flame graphs (e.g., using FlameGraph).
- Analyzing Memory Access Patterns: See how to link memory sampling data to specific data objects to profile detailed memory access characteristics.
Built-in and Hardware-specific Events: Discover built-in events and learn how to define new ones tailored to your hardware.
Perf Paranoid: Learn how to configure perf permissions.

System Requirements

Clang / GCC with support for C++17 features.
CMake version 3.10 or higher.
Linux Kernel 4.0 or newer (note that some features need a newer Kernel).
perf_event_paranoid setting: Adjust as needed to allow access to performance counters (see the Paranoid Value documentation).
Python3, if you make use of processor-specific hardware event generation.

Contribute and Contact

We welcome contributions and feedback. For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.

Alternatively, you can email me: [email protected].

Further PMU-related Projects

Below is a non-exhaustive list of some other valuable profiling projects:

PAPI offers access not only to CPU performance counters but also to a variety of other hardware components including GPUs, I/O systems, and more.
Likwid is a collection of several command line tools for benchmarking, including an extensive wiki.
PerfEvent provides lightweight access to performance counters, facilitating streamlined performance monitoring.
Intel's Instrumentation and Tracing Technology allows applications to manage the collection of trace data effectively when used in conjunction with Intel VTune Profiler.
For those who prefer a more hands-on approach, the perf_event_open system call can be utilized directly without any wrappers.

Resources about (Perf-) Profiling

This is a non-exhaustive list of academic research papers and blog articles (feel free to add to it, e.g., via pull request – also your own work).

Academical Papers

Blog Posts

C2C - False Sharing Detection in Linux Perf (2016)
PMU counters and profiling basics. (2018)
Detect false sharing with Data Address Profiling. (2019)
Advanced profiling topics. PEBS and LBR. (2018)

Name		Name	Last commit message	Last commit date
Latest commit History 439 Commits
docs		docs
events/x86		events/x86
examples		examples
include/perfcpp		include/perfcpp
script		script
src		src
test		test
.clang-format		.clang-format
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

What can perf-cpp do?

Quick Start

Record Hardware Event Statistics

Record Samples

More Examples

Building

Full Documentation

Further Reading

System Requirements

Contribute and Contact

Further PMU-related Projects

Resources about (Perf-) Profiling

Academical Papers

Blog Posts

About

Uh oh!

Releases 20

Packages

Contributors 4

Languages

License

jmuehlig/perf-cpp

Folders and files

Latest commit

History

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

What can perf-cpp do?

Quick Start

Record Hardware Event Statistics

Record Samples

More Examples

Building

Full Documentation

Further Reading

System Requirements

Contribute and Contact

Further PMU-related Projects

Resources about (Perf-) Profiling

Academical Papers

Blog Posts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Contributors 4

Languages

Packages