-
Notifications
You must be signed in to change notification settings - Fork 115
Add inline and inbounds annotations to unsafe_dot #639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #639 +/- ##
=======================================
Coverage 98.13% 98.13%
=======================================
Files 19 19
Lines 3277 3277
=======================================
Hits 3216 3216
Misses 61 61 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Any idea how much difference the |
It is mostly the inline. Rerunning the benchmark: Without the @inbounds:
▅▅▆█▆█████▇████▇▅▁▃▃▁▄▆▁▁▄▃▆▃▅▅▅▅▄▄▃▄▁▅▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▃▁▁▃▃ ▃ Memory estimate: 16 bytes, allocs estimate: 1. Without the @inline julia> @benchmark filt!(output, interp_fil, data)
▃▁▅█▁▅▅▆██▆▅█▆█▅█▅███▆▆██▃▆▃▁▃▃▁▆▃▃▅▁▃▁▅▅▆▃▁▅▃▁▃▁▃▁▅▁▁▁▁▁▁▅ ▃ Memory estimate: 16 bytes, allocs estimate: 1. |
Thanks. In my opinion, overriding the compiler heuristics needs good justification, and those benchmarks surely qualify. The additional |
But if this helps only interpolation, maybe we can use call-site inlining for just that? |
Good point. Can you try that, @jacobga1998? |
Seems to give the same benchmarks, I uploaded it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a style preference, I like the inline macro right beside the function called. Other than that, LGTM. Also, a little benchmarking suggests to me that BLAS.dot
is slower than the Julia version for small kernel lengths, but that's a separate issue and there is no urgent need to change that.
Co-authored-by: wheeheee <[email protected]>
Agreed, that looks nicer. Added the suggestions. |
Hello,
I was using a FIRInterpolation filter and found that it was a bottleneck in my code. When profiling I realized that the compiler wasn't inlining the unsafe_dot function, and subsequently that there was a missing inbounds for the first multiplication. In the benchmark below the execution time after adding
@inline
and@inbounds
is about half. I also tried longer filters, and decimation kernels, and didn't find any difference in the execution time, but I added the annotation to the other functions which did not already have them in case there is a scenario where it matters.Let me know if you think this is worth merging.
Before PR(julia 1.11)
After PR