-
Notifications
You must be signed in to change notification settings - Fork 740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gsl::span performance is a bottleneck for adoption #1165
Comments
@tiagomacarios Thank you for reporting this issue. After a brief look into libcxx's In the meantime, could you share what version of clang you are using? |
We use 3 different version of clang:
We usually update compilers as soon as new LTS versions are available. |
Issue: microsoft#1165 Before this PR, the range-for loop was ~3300x slower. After this PR, it is ~1.005x slower The clang optimizer is very good at optimizing `current != end`, so we changed to this idiom. This moves the Expects assertion into the constructor instead of on the hot-path which is called whenever either operator++ or operator* is called. Note: The codegen for the assertion is still a missed optimization, but less worrisome as it only happens once per iterator. Note: benchmarks on M1 Macbook Pro w/ Apple Clang 16.0.0
Issue: microsoft#1165 Before this PR, the range-for loop was ~3300x slower. After this PR, it is ~1.005x slower The clang optimizer is very good at optimizing `current != end`, so we changed to this idiom. This moves the Expects assertion into the constructor instead of on the hot-path which is called whenever either operator++ or operator* is called. Note: The codegen for the assertion is still a missed optimization, but less worrisome as it only happens once per iterator. Note: benchmarks on M1 Macbook Pro w/ Apple Clang 16.0.0
From the mail thread I see that this has been fixed. Would you be against having performance tests? I wonder if we could use @galenelias tests and have the perf tests fail if they are not within a threshold from each other. |
Performance tests would be great. I'd be happy to either review a PR with Galen's benchmark or I can create the PR if that is more convenient. I'd rather not add a new dimension to the test matrix because we are already pushing 100 runners per PR commit. If we could add one test that compares to the latest standard library implementation for each Clang, GCC, and MSVC that'd be my preference. |
* improve performance of span_iterator w/ clang Issue: #1165 Before this PR, the range-for loop was ~3300x slower. After this PR, it is ~1.005x slower The clang optimizer is very good at optimizing `current != end`, so we changed to this idiom. This moves the Expects assertion into the constructor instead of on the hot-path which is called whenever either operator++ or operator* is called. Note: The codegen for the assertion is still a missed optimization, but less worrisome as it only happens once per iterator. Note: benchmarks on M1 Macbook Pro w/ Apple Clang 16.0.0
Office is trying to adopt gsl::span more broadly, but we are seeing some unexpected performance regressions when compiling with clang:
A - range-for loops
B - std algorithms
On a mail thread @StephanTLavavej suggested:
Could one of the maintainers please follow up?
The text was updated successfully, but these errors were encountered: