blasfeo_ddot: suggestion for improvement #91

roversch · 2019-01-30T10:50:26Z

In the 'reduce' step of blasfeo_ddot, a horizontal add _mm_hadd_pd is computed. Instead, one could replace

u_tmp = _mm_hadd_pd(u_tmp, u_tmp);

with

__m128d hi64 = _mm_unpackhi_pd(u_tmp, u_tmp);
u_tmp = _mm_add_sd(u_tmp, hi64);

effectively trading a packed double operation with a scalar one.

The text was updated successfully, but these errors were encountered:

giaf · 2019-02-02T14:12:55Z

Yes what you propose would indeed reduce the latency by 1 clock cycle: from 5 of hadd to 4=1(unpackhi)+3(add).
But at the end, the reduction code is not so important, the important part is the loop body.

And in general, level 1 BLAS routines are not so important in what we do and can gain much less from optimization, compared to level 2 and especially 3 routines, and therefore they received less attention.

What I would found the most important reason to implement your improvement would be to get rid of the dependency on SSE3 in case of targeting machines with capabilities up to SSE2. I don't know if this is the case for you. The choice to target SSE3 (i.e. the Core microarchitecture) was to have a reasonable trade-off between handiness and availability of ISAs, also on embedded devices, which usually lag a bit behind.

giaf · 2019-02-02T19:10:35Z

Sure if you want to make the changes and make a PR, I would be happy to merge it. But otherwise I would leave it as it is for now, other stuff has higher priority from my side.

Thanks anyway for the suggestion :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blasfeo_ddot: suggestion for improvement #91

blasfeo_ddot: suggestion for improvement #91

roversch commented Jan 30, 2019

giaf commented Feb 2, 2019 •

edited

Loading

giaf commented Feb 2, 2019

blasfeo_ddot: suggestion for improvement #91

blasfeo_ddot: suggestion for improvement #91

Comments

roversch commented Jan 30, 2019

giaf commented Feb 2, 2019 • edited Loading

giaf commented Feb 2, 2019

giaf commented Feb 2, 2019 •

edited

Loading