-
Notifications
You must be signed in to change notification settings - Fork 29
Trying to speed up acquisition and image data algebra #1313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
closing in favour of #1318 |
Kris had this comment about PR #1318:
I replied with my explanation: I could not push to Now that I can push to @KrisThielemans would you approve? |
I think either works:
I don't mind which. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR has 2 conceptual changes, and I have some questions for those. (as well as have some small comments)
No longer use binary_op
etc in Gadgetron algebra
Why was this done? It leads to quite some code repetition I think. Was it because of speed-up to try and avoid calling a function pointer inside a loop? If so, what was the speed-up? I think this could be achieved by making binary_op
et al inline
and making sure the definition is visible before you use it (i.e. potentially move it up). If that still doesn't do it, I'd try to use a template as opposed to a function pointer, e.g. change
void
MRAcquisitionData::binary_op
(const ISMRMRD::Acquisition& acq_x, ISMRMRD::Acquisition& acq_y,
complex_float_t (*f)(complex_float_t, complex_float_t))
to
template <class BinaryOperation>
void
MRAcquisitionData::binary_op
(const ISMRMRD::Acquisition& acq_x, ISMRMRD::Acquisition& acq_y,
BinaryOperation f)
It does exactly the same, but tries to force the compiler to instantiate this for a type of f
.
(By the way, binary_op
et al. seem to be pretty much equivalent to calls to std::transform
, but that's more work).
add initialise_with_0
parameter to DataContainer
constructor (and set it to false
for some STIR operations)
The problem that I have with this is in nearly all derived classes, this parameter is unused. This means we tell the user they can use a bool
to say we initialise with 0, but actually it we don't.
STIR objects by default initialise to 0, but for Gadgetron/Nifti stuff, I guess they don't initialise with 0 (or we don't really know what they do). If that's true, I think we need to explicitly initialise with 0 if initialise_with_0
is true
.
Then I'm also in favour if the 2nd option. Less confusing merges (and internal rebases). |
@casperdcl could you please have a look at the build errors - something weird is happening with PET and MR Python tests |
I also prefer 2nd option and, as we all seem to agree on that, @KrisThielemans could you please unblock merging, so that we could move on to finalizing and merging #1314? (Your thoughts and suggestions on |
I'm sorry, but
I don't really like that, tobe honest. |
@evgueni-ovtchinnikov I can't reproduce your compiler optimisation issue... templatorture.cpp/**
* g++ -O3 -o templatorture templatorture.cpp && ./templatorture
*
* loop : 902.949ms
* loop_fn: 815.874ms
* op : 796.916ms
* op_temp: 911.107ms
*/
#include<iostream>
#include<vector>
#include<chrono>
void op(std::vector<float> &vec, float(*f)(float)) {
for (auto &v : vec)
v = f(v);
}
template<typename Func, typename T>
void op_temp(std::vector<T> &vec, Func f) {
for (auto &v : vec)
v = f(v);
}
float func(float x) {
return x * 2;
}
auto now = std::chrono::high_resolution_clock::now;
int main(){
std::vector<float> test(987654321, 1.0f);
auto start = now();
for (auto &v : test) v *= 2;
auto end = now();
std::cout << "loop : " << (end - start).count() /1e6 << "ms" << std::endl;
start = now();
for (auto &v : test) v = func(v);
end = now();
std::cout << "loop_fn: " << (end - start).count() /1e6 << "ms" << std::endl;
start = now();
op(test, func);
end = now();
std::cout << "op : " << (end - start).count() /1e6 << "ms" << std::endl;
start = now();
op_temp(test, func);
end = now();
std::cout << "op_temp: " << (end - start).count() /1e6 << "ms" << std::endl;
return 0;
} |
because computing say |
@KrisThielemans I created https://github.com/SyneRBI/SIRF/tree/try-semibinary-template where I added method
which does the same job as
I compared the two approaches by computing If you want to check yourself, comment out line 862 in It remains to note that with the simple loop (line 862 not commented out) I got 1.29 sec. |
Have you tried things like |
Tried now, got build error
|
looks like you have an extra space after the underscores? |
@casperdcl just my typing error, sorry - here is the actual build error:
|
argh... try maybe template<typename T> static
#if defined(__clang__)
inline
#elif defined(__GNUC__)
__attribute__((always_inline)) inline
#elif defined(_MSC_VER)
__forceinline
#else
inline
#endif
T sum(T x, T y) ... if that works we can always put it into an |
timing for x + 2 remains practically the same: 6.97 sec for 5 runs (the simple loop at 866-869 taking 1.33 sec) will have to try macros then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunate, but fine by me
try moving template further up the hierarchy, in e.g. try-semibinary-template:stir_data_containers.cpp: -void
-STIRAcquisitionData::semibinary_op(
- const DataContainer& a_x,
- float y,
- float(*f)(float, float)
-)
+template <typename Func> void
+STIRAcquisitionData::semibinary_op(
+ const DataContainer& a_x,
+ float y,
+ Func f,
+)
+/// usage: semibinary_op(x, y, OP); or maybe -void
-STIRAcquisitionData::semibinary_op(
- const DataContainer& a_x,
- float y,
- float(*f)(float, float)
-)
+template <typename Func> void
+STIRAcquisitionData::semibinary_op(
+ const DataContainer& a_x,
+ float y
+)
+/// usage: semibinary_op<OP>(x, y); |
first suggestion (actually already tried on try-semibinary-template): still 5 times slower than a simple loop second suggestion: got build errors:
|
I checked the implementation on that branch, and in particular SIRF/src/xSTIR/cSTIR/stir_data_containers.cpp Line 299 in 61de4a2
The issue here is that the template is implemented in the .cpp, but it used in the .h files, e.g.
That means the compiler cannot inline things as far as I know (although it's hard to know with templates...). I suggest to first test with moving the If that doesn't work, please try to use
I'm not surprised, as using Binary_op_Type = float(float, float);
template <Binary_op_Type f> void
STIRAcquisitionData::semibinary_op(
const DataContainer& a_x,
float y
)
/// usage: semibinary_op<OP>(x, y); Don't forget that we have |
moved the template to
this did not compile, replaced with
which works but still about 5 times slower for |
too bad. I'm a bit confused though, as the above had 2 different solutions (the one for "mine" and the one for "Casper's", where the latter needs the |
the branch name is do not forget to comment-out line 902 in |
@evgueni-ovtchinnikov @casperdcl We discussed to use "contiguous" implementations in We might still be able to put a template implementation
and in
This might work for everything except |
tried the latest Kris' suggestion, got lots of build errors like:
etc. etc. |
well sure, please correct my typos and emisions. |
@KrisThielemans @casperdcl
This indeed reduced the computation time, which however remained longer (about 2 times) than with macros. I have pushed the commit with this addition, so you can check yourself.
which turned out to be a bit faster (from 0.624 sec to 0.532 sec), but still well above 0.255 sec with macros. In the circumstances, I believe it would be reasonable to finalise and merge |
Superseded by #1320 |
Changes in this pull request
Testing performed
Two algebra timings tests added in SIRF/tests
Related issues
Contains SIRF part of STIR PR #1549
Checklist before requesting a review
Contribution Notes
Please read and adhere to the contribution guidelines.
Please tick the following: