Releases: HPC-Dwarfs/TheBandwidthBenchmark
Releases · HPC-Dwarfs/TheBandwidthBenchmark
Major release v3.0
New features:
- Support to measure sustained memory bandwidth on NVIDIA GPUs
- Support also random array initialization instead of constants
- Option to enable AVX512 intrinsics to enforce non temporal stores
- Introduce command line arguments to overwrite most default settings
Other things:
- A major refactoring of most of the code
- Stricter clang-tidy rules
- Cleanup formatting
- Improve README
Major release v2.0
New modes to scan range of sizes in order to measure a bandwidth profile for the complete memory hierarchy. Sequential mode will use one thread and throughput mode will test bandwidth scaling of memory hierarchy levels using multiple threads but without any work sharing overhead. We added shell scripts to generate plots for these new nodes using Gnuplot.
Other changes:
- Intel OneAPI compiler is the default now
- Removed Intel compiler flag for NT Stores
- Refactor code: Introduce HARNESS macros to eliminate redundant code
- Extend README
Minor Release v1.4
These are mostly cosmetic changes:
- Put kernels in separate module
- Put profiling and LIKWID instrumentation in separate module
- Add clang-format specification and reformat
- Add banner
- Replace huge copyright header with something smaller
- Make NHR@FAU copyright holder
- Add new build targets for format and
.clangdand remove tags target - Clean up of Makefile and sources
New features:
- The Makefile will automatically generate a clang LSP configuration
- The CLANG toolchain is the default now. Please change to other toolchains in config.mk
VERBOSE_AFFINITYwill now output the complete affinity mask and the processor a thread is currently scheduled onmake distcleanwill now clean all toolchains, enabled by another directory level ./build for the build products- Correct rebuild of all objects if any build configuration has changed (
include_<TOOLCHAIN>.mkandconfig.mk)
Minor release v1.3
Transfer benchmarking scripts to Wiki.
Move benchmarking documentation to Wiki.
Update Makefile.
Minor release v1.2
Changelog for 1.2:
- Use schedule(static) clause for all worksharing constructs
- Pull Likwid instrumentation outside benchmark functions
- Add script to extract scaling runs
Minor release v1.1
Changelog for 1.1:
- Increase default problem size to almost 4GB to compensate for OpenMP overhead.
- Turn on streaming stores always for Intel toolchain
- Explicitly set static scheduling for OMP for loops
- Add golang version in util
- Add single file versions (C and Fortran) for teaching
- Improve LIKWID instrumentation
Initial release
v1.0 Update README.md