-
-
Notifications
You must be signed in to change notification settings - Fork 228
Introduce BlockSize
#3716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Introduce BlockSize
#3716
Conversation
bd90307 to
10cc79c
Compare
|
Looks very nice. Could we review the basic approach before you spend lots more time on it? |
|
Sure thing. Should be good to go as is and can be extended further when approved. One neat byproduct, that these changes would allow for, are non compile time sized operations on the |
chrisrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
|
Looks really neat.
|
|
For points 1 and 2 that should be no problem - how about: Regarding 3: the interface to retrieve the value (here |
I don't like relying on the compiler to inline things that we know are known at compile time. We have avoided this in the past and preferred being explicit over relying on the compiler and then not knowing what the compiler does. |
|
It would be best if the |
|
It think I have a fix: |
|
I made a further tweaks to the concept. |
|
I'm not convinced by this approach. The block size in the assemblers is performance critical and what's happening should be explicit, with no room relying on what a compiler might do. For me this PR is not explicit and relies too much on what the compiler might do. |
|
We extract the block sizes with the auto bs = block_size(_bs)inferring the type from the returned type.
The compile time valued version thus guarantees evaluation to a constant. Making this code identical to the previous one. The only possible risk is for the runtime version, where the inlining of int (int x) {return x;}needs to happen for an identical path to before. Could we run some benchmarks to confirm this happens? |
|
Quick testing at https://godbolt.org/z/dze1aq9br yielded we should definitely mark the runtime version inline, then |
|
@garth-wells I've reviewed the code quite carefully and I don't think that e.g. What could @schnellerhase do in terms of performance tests or additional unit tests that might persuade you that it does work? |
In performance critical parts some block sizes are optimized for by compiling explicit versions with the block size being provided as a compile time constant. At the same time general runtime block sizes are supported through an argument to these functions.
This causes
Release)Introduces a
BlockSizeconcept that either holds a runtimeintor a compile timestd::integral_constant<int, bs>which allows to generate code paths explicitly for certain sizes, while maintaining a shared code path in both cases.This is based on a more general concept of an optionally compile time valued
ConstexprType<T, V>. It stores a value of typeTin the container typeV. If runtime valued, thenT = V. For compile time, usuallyT = std::integral_constant<T, ...>.Future applications:
dolfinx/python/dolfinx/wrappers/assemble.cpp
Lines 348 to 441 in f1daede