Skip to content

Conversation

evgueni-ovtchinnikov
Copy link
Contributor

@evgueni-ovtchinnikov evgueni-ovtchinnikov commented Mar 25, 2025

Changes in this pull request

Testing performed

Two algebra timings tests added in SIRF/tests

Related issues

Contains SIRF part of STIR PR #1549

Checklist before requesting a review

  • I have performed a self-review of my code
  • I have added docstrings/doxygen in line with the guidance in the developer guide
  • I have implemented unit tests that cover any new or modified functionality
  • The code builds and runs on my machine
  • CHANGES.md has been updated with any functionality change

Contribution Notes

Please read and adhere to the contribution guidelines.

Please tick the following:

  • The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in SIRF (the Work) under the terms and conditions of the Apache-2.0 License.

@evgueni-ovtchinnikov evgueni-ovtchinnikov changed the title Trying to speed up STIR acquisition and image data algebra by avoiding unnecessary 0-fill of newly created data Trying to speed up acquisition and image data algebra Mar 25, 2025
@casperdcl
Copy link
Member

closing in favour of #1318

@casperdcl casperdcl closed this May 22, 2025
@evgueni-ovtchinnikov
Copy link
Contributor Author

Kris had this comment about PR #1318:

I'm a bit confused. Do we need this PR? Better to get smaller ones merged first.

I replied with my explanation: I could not push to array-ptr so I merged array-ptr and try-fix-algebra into data-algebra-opt and created PR #1318.

Now that I can push to array-ptr, I would rather agree with Kris' suggestion and first merged on master smaller try-fix-algebra (looks ready to me) and then (when ready too) array-ptr.

@KrisThielemans would you approve?

@casperdcl
Copy link
Member

casperdcl commented May 30, 2025

Copy link
Member

@KrisThielemans KrisThielemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR has 2 conceptual changes, and I have some questions for those. (as well as have some small comments)

No longer use binary_op etc in Gadgetron algebra

Why was this done? It leads to quite some code repetition I think. Was it because of speed-up to try and avoid calling a function pointer inside a loop? If so, what was the speed-up? I think this could be achieved by making binary_op et al inline and making sure the definition is visible before you use it (i.e. potentially move it up). If that still doesn't do it, I'd try to use a template as opposed to a function pointer, e.g. change

void
MRAcquisitionData::binary_op
(const ISMRMRD::Acquisition& acq_x, ISMRMRD::Acquisition& acq_y, 
    complex_float_t (*f)(complex_float_t, complex_float_t))

to

template <class BinaryOperation>
void
MRAcquisitionData::binary_op
(const ISMRMRD::Acquisition& acq_x, ISMRMRD::Acquisition& acq_y, 
    BinaryOperation f)

It does exactly the same, but tries to force the compiler to instantiate this for a type of f.

(By the way, binary_op et al. seem to be pretty much equivalent to calls to std::transform, but that's more work).

add initialise_with_0 parameter to DataContainer constructor (and set it to false for some STIR operations)

The problem that I have with this is in nearly all derived classes, this parameter is unused. This means we tell the user they can use a bool to say we initialise with 0, but actually it we don't.

STIR objects by default initialise to 0, but for Gadgetron/Nifti stuff, I guess they don't initialise with 0 (or we don't really know what they do). If that's true, I think we need to explicitly initialise with 0 if initialise_with_0 is true.

@KrisThielemans
Copy link
Member

I think either works:

I don't mind which.

Then I'm also in favour if the 2nd option. Less confusing merges (and internal rebases).

@evgueni-ovtchinnikov
Copy link
Contributor Author

@casperdcl could you please have a look at the build errors - something weird is happening with PET and MR Python tests

@evgueni-ovtchinnikov
Copy link
Contributor Author

I think either works:

I don't mind which.

Then I'm also in favour if the 2nd option. Less confusing merges (and internal rebases).

I also prefer 2nd option and, as we all seem to agree on that, @KrisThielemans could you please unblock merging, so that we could move on to finalizing and merging #1314? (Your thoughts and suggestions on binary_op and initialise_with_0 deserve a separate PR, I believe.)

@KrisThielemans
Copy link
Member

(Your thoughts and suggestions on binary_op and initialise_with_0 deserve a separate PR, I believe.)

I'm sorry, but

  • this PR duplicates a lot of code, which we're then going to de-duplicate in another PR.
  • introduces a new variable, but it doesn't do what it's supposed to do.

I don't really like that, tobe honest.

@casperdcl casperdcl mentioned this pull request Jun 10, 2025
7 tasks
@casperdcl
Copy link
Member

casperdcl commented Jun 10, 2025

@evgueni-ovtchinnikov I can't reproduce your compiler optimisation issue...

templatorture.cpp
/**
 * g++ -O3 -o templatorture templatorture.cpp && ./templatorture
 *
 * loop   : 902.949ms
 * loop_fn: 815.874ms
 * op     : 796.916ms
 * op_temp: 911.107ms
 */
#include<iostream>
#include<vector>
#include<chrono>

void op(std::vector<float> &vec, float(*f)(float)) {
  for (auto &v : vec)
    v = f(v);
}

template<typename Func, typename T>
void op_temp(std::vector<T> &vec, Func f) {
  for (auto &v : vec)
    v = f(v);
}

float func(float x) {
  return x * 2;
}

auto now = std::chrono::high_resolution_clock::now;

int main(){
  std::vector<float> test(987654321, 1.0f);

  auto start = now();
  for (auto &v : test) v *= 2;
  auto end = now();
  std::cout << "loop   : " << (end - start).count() /1e6 << "ms" << std::endl;

  start = now();
  for (auto &v : test) v = func(v);
  end = now();
  std::cout << "loop_fn: " << (end - start).count() /1e6 << "ms" << std::endl;

  start = now();
  op(test, func);
  end = now();
  std::cout << "op     : " << (end - start).count() /1e6 << "ms" << std::endl;

  start = now();
  op_temp(test, func);
  end = now();
  std::cout << "op_temp: " << (end - start).count() /1e6 << "ms" << std::endl;

  return 0;
}

@evgueni-ovtchinnikov
Copy link
Contributor Author

No longer use binary_op etc in Gadgetron algebra
Why was this done?

because computing say x + 2 by semibinary_op for GadgetronImageData x takes about 20 times longer (and for STIRAcquisitionDataInMemory about 5 times longer).

@evgueni-ovtchinnikov
Copy link
Contributor Author

evgueni-ovtchinnikov commented Jun 27, 2025

@KrisThielemans I created https://github.com/SyneRBI/SIRF/tree/try-semibinary-template where I added method

template<class BinaryOp> void STIRAcquisitionData::semibinary_op_templ(const DataContainer& a_x, float y, BinaryOp f)

which does the same job as

void STIRAcquisitionData::semibinary_op(const DataContainer& a_x, float y, float(*f)(float, float))

I compared the two approaches by computing x + 2, where x is the additive_term from Mediso_NEMA_IQ_lowcounts dataset, and found no performance improvement: five additions took 8.64 sec with the templated version and 7.47 sec with the function-pointer one.

If you want to check yourself, comment out line 862 in stir_data_containers.h, which will force using semibinary_op/semibinary_op_template instead of the simple loop at 866-869.

It remains to note that with the simple loop (line 862 not commented out) I got 1.29 sec.

@casperdcl
Copy link
Member

Have you tried things like __forceinline in try-semibinary-template?

@evgueni-ovtchinnikov
Copy link
Contributor Author

Have you tried things like __forceinline in try-semibinary-template?

Tried now, got build error

error: '__ forceinline' does not name a type

@casperdcl
Copy link
Member

error: '__ forceinline' does not name a type

looks like you have an extra space after the underscores?

@evgueni-ovtchinnikov
Copy link
Contributor Author

@casperdcl just my typing error, sorry - here is the actual build error:

/home/sirfuser/devel/buildVM/sources/SIRF/src/common/include/sirf/common/DataContainer.h:210:24: error: ‘__forceinline’ does not name a type
                  static __forceinline T sum(T x, T y)

@casperdcl
Copy link
Member

argh... try maybe

template<typename T> static
#if defined(__clang__)
inline
#elif defined(__GNUC__)
__attribute__((always_inline)) inline
#elif defined(_MSC_VER)
__forceinline
#else
inline
#endif
T sum(T x, T y)

... if that works we can always put it into an INLINE_MACRO for re-usability.

@evgueni-ovtchinnikov
Copy link
Contributor Author

timing for x + 2 remains practically the same: 6.97 sec for 5 runs (the simple loop at 866-869 taking 1.33 sec)

will have to try macros then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunate, but fine by me

@casperdcl
Copy link
Member

try moving template further up the hierarchy, in e.g. try-semibinary-template:stir_data_containers.cpp:

-void
-STIRAcquisitionData::semibinary_op(
-	const DataContainer& a_x,
-	float y,
-	float(*f)(float, float)
-)
+template <typename Func> void
+STIRAcquisitionData::semibinary_op(
+	const DataContainer& a_x,
+	float y,
+       Func f,
+)
+/// usage: semibinary_op(x, y, OP);

or maybe

-void
-STIRAcquisitionData::semibinary_op(
-	const DataContainer& a_x,
-	float y,
-	float(*f)(float, float)
-)
+template <typename Func> void
+STIRAcquisitionData::semibinary_op(
+	const DataContainer& a_x,
+	float y
+)
+/// usage: semibinary_op<OP>(x, y);

@evgueni-ovtchinnikov
Copy link
Contributor Author

evgueni-ovtchinnikov commented Aug 21, 2025

first suggestion (actually already tried on try-semibinary-template): still 5 times slower than a simple loop

second suggestion: got build errors:

/home/sirfuser/devel/buildVM/sources/SIRF/src/xSTIR/cSTIR/include/sirf/STIR/stir_data_containers.h: In member function ‘virtual void sirf::STIRAcquisitionData::add(const sirf::DataContainer&, const void*)’:
/home/sirfuser/devel/buildVM/sources/SIRF/src/xSTIR/cSTIR/include/sirf/STIR/stir_data_containers.h:352:72: error: no matching function for call to ‘sirf::STIRAcquisitionData::semibinary_op_templ<((sirf::STIRAcquisitionData*)this)->sum<float> >(const sirf::DataContainer&, float&)’
  352 |                         semibinary_op_templ<DataContainer::sum<float> >(x, y);
      |                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/home/sirfuser/devel/buildVM/sources/SIRF/src/xSTIR/cSTIR/include/sirf/STIR/stir_data_containers.h:527:6: note: candidate: ‘template<class BinaryOp> void sirf::STIRAcquisitionData::semibinary_op_templ(const sirf::DataContainer&, float)’
  527 | void semibinary_op_templ(const DataContainer& a_x, float y)
      |      ^~~~~~~~~~~~~~~~~~~
/home/sirfuser/devel/buildVM/sources/SIRF/src/xSTIR/cSTIR/include/sirf/STIR/stir_data_containers.h:527:6: note:   template argument deduction/substitution failed:
/home/sirfuser/devel/buildVM/sources/SIRF/src/xSTIR/cSTIR/include/sirf/STIR/stir_data_containers.h:352:72: error: type/value mismatch at argument 1 in template parameter list for ‘template<class BinaryOp> void sirf::STIRAcquisitionData::semibinary_op_templ(const sirf::DataContainer&, float)’
  352 |                         semibinary_op_templ<DataContainer::sum<float> >(x, y);
      |                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/home/sirfuser/devel/buildVM/sources/SIRF/src/xSTIR/cSTIR/include/sirf/STIR/stir_data_containers.h:352:72: note:   expected a type, got ‘((sirf::STIRAcquisitionData*)this)->sum<float>’
make[2]: *** [src/Synergistic/cSyn/CMakeFiles/csyn.dir/build.make:76: src/Synergistic/cSyn/CMakeFiles/csyn.dir/utilities.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1614: src/Synergistic/cSyn/CMakeFiles/csyn.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

@KrisThielemans
Copy link
Member

first suggestion (actually already tried on try-semibinary-template): still 5 times slower than a simple loop

I checked the implementation on that branch, and in particular

STIRAcquisitionData::semibinary_op_templ(

The issue here is that the template is implemented in the .cpp, but it used in the .h files, e.g.
semibinary_op_templ(x, y, DataContainer::sum<float>);

That means the compiler cannot inline things as far as I know (although it's hard to know with templates...). I suggest to first test with moving the semi_binaryop_template implementation in the .h file (and above any usage).

If that doesn't work, please try to use std::plus instead of DataContainer::sum (although I can't imaging that's going to make any difference, as it's essentially the same). If that doesn't work, last attempt is to use a lambda []<class T>(T a, T b) { return a + b; } (in C++-17 [](auto a, auto b) { return a+b;} should work and is a bit clearer) or even [](float a, float b) { return a+b;} (although that's being rather desperate).

second suggestion: got build errors:

I'm not surprised, as DataContainer::sum<float> is not a type. I think (but am not sure) this might have to be

using Binary_op_Type = float(float, float);

template <Binary_op_Type f> void
STIRAcquisitionData::semibinary_op(
	const DataContainer& a_x,
	float y
)
/// usage: semibinary_op<OP>(x, y);

Don't forget that we have stir::ProjData::operator+= etc.

@evgueni-ovtchinnikov
Copy link
Contributor Author

The issue here is that the template is implemented in the .cpp, but it used in the .h files

moved the template to stir_data_containers.h

using Binary_op_Type = float(float, float);

this did not compile, replaced with

typedef float (Binary_op_Type)(float, float);

which works but still about 5 times slower for x + 2 than the simple loop at stir_data_containers.h:906-909.

@KrisThielemans
Copy link
Member

too bad. I'm a bit confused though, as the above had 2 different solutions (the one for "mine" and the one for "Casper's", where the latter needs the typedef). Can you please push this to the semi_binaryop_template branch and say which one?

@evgueni-ovtchinnikov
Copy link
Contributor Author

the branch name is try-semibinary-template, I pushed the latest version about an hour ago

do not forget to comment-out line 902 in stir_data_containers.h to force the use of templated STIRAcquisitionData::add instead of the simple loop at 906-909

@KrisThielemans
Copy link
Member

@evgueni-ovtchinnikov @casperdcl We discussed to use "contiguous" implementations in DataContainer. I don't think anymore this is possible, as DataContainer doesn't know the type of objects it contains.

We might still be able to put a template implementation semibinary_op in DataContainer to avoid repetition. Something like

template<elemT, class Operation>
static void semibinary_op_templ(DataContainer& dest, const DataContainer& a_x, float y, Operation op)
{
   assert(a_x.items() == dest.items());
   if (dest.supports_array_view() && a_x.supports_array_view())
   {
     auto a_x_begin = reinterpet_cast<elemT * const>(a_x.address());
     auto a_x_end = a_x_begin + a_x.items();
     auto dest_begin = reinterpet_cast<elemT *>(dest.address());
     std::transform(a_x_begin, a_x_end, dest_begin, op);
   }
   else
   {
     std::transform(a_x.begin(), a_x.end(), dest.begin(), op);
   }
}

and in STIRImageData for instance

virtual void add(const DataContainer& x, const void* ptr_y) override
{
	float y = *static_cast<const float*>(ptr_y);
	semibinary_op_templ<float>(*this, x, y,  DataContainer::sum<float>); // or std::plus<float>
}

This might work for everything except STIRAcquisitionDataFile.

@evgueni-ovtchinnikov
Copy link
Contributor Author

tried the latest Kris' suggestion, got lots of build errors like:

/home/sirfuser/devel/buildVM/sources/SIRF/src/common/include/sirf/common/DataContainer.h:102:10: error: ‘elemT’ has not been declared
  102 | template<elemT, class Operation>
      |          ^~~~~
/home/sirfuser/devel/buildVM/sources/SIRF/src/common/include/sirf/common/DataContainer.h:105:4: error: there are no arguments to ‘assert’ that depend on a template parameter, so a declaration of ‘assert’ must be available [-fpermissive]
  105 |    assert(a_x.items() == dest.items());
      |    ^~~~~~
/home/sirfuser/devel/buildVM/sources/SIRF/src/common/include/sirf/common/DataContainer.h:105:4: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
/home/sirfuser/devel/buildVM/sources/SIRF/src/common/include/sirf/common/DataContainer.h:108:23: error: ‘reinterpet_cast’ was not declared in this scope
  108 |      auto a_x_begin = reinterpet_cast<elemT * const>(a_x.address());
      |                       ^~~~~~~~~~~~~~~
/home/sirfuser/devel/buildVM/sources/SIRF/src/common/include/sirf/common/DataContainer.h:108:39: error: ‘elemT’ was not declared in this scope
  108 |      auto a_x_begin = reinterpet_cast<elemT * const>(a_x.address());
      |                                       ^~~~~
/home/sirfuser/devel/buildVM/sources/SIRF/src/common/include/sirf/common/DataContainer.h:108:47: error: expected primary-expression before ‘const’
  108 |      auto a_x_begin = reinterpet_cast<elemT * const>(a_x.address());
      |                                               ^~~~~
/home/sirfuser/devel/buildVM/sources/SIRF/src/common/include/sirf/common/DataContainer.h:111:11: error: ‘transform’ is not a member of ‘std’
  111 |      std::transform(a_x_begin, a_x_end, dest_begin, op);
      |           ^~~~~~~~~
/home/sirfuser/devel/buildVM/sources/SIRF/src/common/include/sirf/common/DataContainer.h:115:25: error: ‘const class sirf::DataContainer’ has no member named ‘begin’
  115 |      std::transform(a_x.begin(), a_x.end(), dest.begin(), op);
      |                         ^~~~~
/home/sirfuser/devel/buildVM/sources/SIRF/src/common/include/sirf/common/DataContainer.h:115:38: error: ‘const class sirf::DataContainer’ has no member named ‘end’
  115 |      std::transform(a_x.begin(), a_x.end(), dest.begin(), op);
      |                                      ^~~

etc. etc.

@KrisThielemans
Copy link
Member

well sure, please correct my typos and emisions. class elemT, reinterpret_cast, #include <algorithm> etc

@evgueni-ovtchinnikov
Copy link
Contributor Author

@KrisThielemans @casperdcl
Re our discussion last Thursday: I agree that my macros vs passing function pointer comparison the way I coded it was not fair, so I made this addition in STIRAcquisitionData::semibinary_op after the line SIRF_DYNAMIC_CAST(const STIRAcquisitionData, x, a_x);:

	auto *pd_ptr   = dynamic_cast<stir::ProjDataInMemory*>(data().get());
	auto *pd_x_ptr = dynamic_cast<const stir::ProjDataInMemory*>(x.data().get());
	if (!is_null_ptr(pd_ptr) && !is_null_ptr(pd_x_ptr)) {
		auto iter = pd_ptr->begin();
		auto iter_x = pd_x_ptr->begin();
		while (iter != pd_ptr->end())
			*iter++ = f(*iter_x++, y);
		return;
	}

This indeed reduced the computation time, which however remained longer (about 2 times) than with macros. I have pushed the commit with this addition, so you can check yourself.
I also tried Kris' templated version

template<class Operation>
void
semibinary_op_templ(const DataContainer& a_x, float y, Operation f)
...

which turned out to be a bit faster (from 0.624 sec to 0.532 sec), but still well above 0.255 sec with macros.

In the circumstances, I believe it would be reasonable to finalise and merge try-macros4algebra for now and return to the latest array views branch array-ptr-mr, making further attempts to find a nicer design a separate issue/PR.

@paskino
Copy link
Contributor

paskino commented Oct 13, 2025

Superseded by #1320

@paskino paskino closed this Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants