improve performance of span_iterator w/ clang #1166

carsonRadtke · 2024-11-09T20:25:01Z

Before this PR, the range-for loop was ~3300x slower. After this PR, it is ~1.005x slower

The clang optimizer is very good at optimizing current != end, so we changed to this idiom. This moves the Expects assertion into the constructor instead of on the hot-path which is called whenever either operator++ or operator* is called.

Note: The codegen for the assertion is still a missed optimization, but less worrisome as it only happens once per iterator.

carsonRadtke · 2024-11-09T20:29:28Z

Benchmark details:

M1 Macbook Pro
Apple Clang 16.0.0
span over std::vector<int>(10000)

Benchmark results:

#include <gsl/gsl>
#include <span>

template <typename T>
void foo(const T &sp) {
    for (const auto &x : sp) {
    }
}

auto gsl_foo = foo<gsl::span<int>>;
auto std_foo = foo<std::span<int>>;

Before:

Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 2.00, 1.97, 1.92
-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
BM_gsl_foo       3147 ns         3146 ns       220999
BM_std_foo      0.941 ns        0.941 ns    743739309

After:

Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 1.99, 1.97, 1.92
-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
BM_gsl_foo      0.955 ns        0.953 ns    724442697
BM_std_foo      0.949 ns        0.948 ns    742548000

Issue: microsoft#1165 Before this PR, the range-for loop was ~3300x slower. After this PR, it is ~1.005x slower The clang optimizer is very good at optimizing `current != end`, so we changed to this idiom. This moves the Expects assertion into the constructor instead of on the hot-path which is called whenever either operator++ or operator* is called. Note: The codegen for the assertion is still a missed optimization, but less worrisome as it only happens once per iterator. Note: benchmarks on M1 Macbook Pro w/ Apple Clang 16.0.0

carsonRadtke · 2024-11-12T21:39:28Z

Benchmark	CPU Before (ns)	CPU After (ns)	Ratio
BM_IsSortedStdSpan	325	323	1.0
BM_IsSortedGslSpan	1573	322	4.9
BM_IsSortedRangesStdSpan	325	323	1.0
BM_IsSortedRangesGslSpan	1583	475	3.3
BM_IsSortedCustomStdSpan	442	435	1.0
BM_IsSortedCustomGslSpan	1420	629	2.3
BM_MinElementAlgorithmStdSpan	637	631	1.0
BM_MinElementAlgorithmGslSpan	1913	1852	1.0
BM_MinElementRangeForStdSpan	33.8	33.6	1.0
BM_MinElementRangeForGslSpan	635	33.2	19.1

More benchmarks (thanks to @galenelias). I'd like to see an improvement in std::min_element, but I will open an issue for it instead of blocking this PR.

It would also be nice to have performance regression tests (see #1165 (comment)). I will open an issue for this as well.

carsonRadtke force-pushed the clang-perf branch from 42cfc54 to 898db79 Compare November 9, 2024 20:30

carsonRadtke marked this pull request as draft November 9, 2024 20:32

carsonRadtke added 2 commits November 9, 2024 14:42

fix some typos

b8bfc19

fix failing test

63f9058

carsonRadtke marked this pull request as ready for review November 11, 2024 20:22

carsonRadtke merged commit f8ec309 into microsoft:main Nov 12, 2024
95 checks passed

carsonRadtke deleted the clang-perf branch November 12, 2024 21:41

carsonRadtke mentioned this pull request Nov 12, 2024

Improve performance of std::min_element on gsl::span #1168

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve performance of span_iterator w/ clang #1166

improve performance of span_iterator w/ clang #1166

carsonRadtke commented Nov 9, 2024

carsonRadtke commented Nov 9, 2024 •

edited

Loading

carsonRadtke commented Nov 12, 2024

improve performance of span_iterator w/ clang #1166

improve performance of span_iterator w/ clang #1166

Conversation

carsonRadtke commented Nov 9, 2024

carsonRadtke commented Nov 9, 2024 • edited Loading

carsonRadtke commented Nov 12, 2024

carsonRadtke commented Nov 9, 2024 •

edited

Loading