new kernels for atan2 #636

Ka-zam · 2023-10-04T14:14:25Z

New kernels for atan2 based on the recently merged arctan work. Almost 40x speedup.

With this PR:

$ volk_profile -R atan2

RUN_VOLK_TESTS: volk_32fc_s32f_atan2_32f(131071,1987)
generic completed in 5091.65 ms
polynomial completed in 2138.51 ms
a_avx2_fma completed in 131.781 ms
a_avx2 completed in 131.696 ms
u_avx2_fma completed in 131.963 ms
u_avx2 completed in 132.086 ms
Best aligned arch: a_avx2
Best unaligned arch: u_avx2_fma

Without:

$ volk_profile -R atan2

RUN_VOLK_TESTS: volk_32fc_s32f_atan2_32f(131071,1987)
a_sse4_1 completed in 5159.66 ms
a_sse completed in 5168.12 ms
generic completed in 5201.91 ms
Best aligned arch: a_sse4_1
Best unaligned arch: generic

Signed-off-by: Magnus Lundmark <[email protected]>

jdemel

Thanks for this PR. However, did you remove the SSE kernels? They should stay.

Ka-zam · 2023-10-13T07:50:54Z

There seems to be a dependency on LV_HAVE_LIB_SIMDMATH which I've never heard of and can't find any information on. In my case this simply compiles to the generic case and my machine is definitely capable of SSE4_1.

It is not a SSE4_1 kernel in any case.

Perhaps something like this instead:

#if LV_HAVE_SSE4_1 && LV_HAVE_LIB_SIMDMATH
#include <smmintrin.h>
#include <simdmath.h>

static inline void volk_32fc_s32f_atan2_32f_a_sse4_1_simdmath(float* outputVector,
.
.
.

#ifdef LV_HAVE_SSE4_1
#include <smmintrin.h>

#ifdef LV_HAVE_LIB_SIMDMATH
#include <simdmath.h>
#endif /* LV_HAVE_LIB_SIMDMATH */

static inline void volk_32fc_s32f_atan2_32f_a_sse4_1(float* outputVector,
                                                     const lv_32fc_t* complexVector,
                                                     const float normalizeFactor,
                                                     unsigned int num_points)
{
    const float* complexVectorPtr = (float*)complexVector;
    float* outPtr = outputVector;

    unsigned int number = 0;
    const float invNormalizeFactor = 1.0 / normalizeFactor;

#ifdef LV_HAVE_LIB_SIMDMATH                                           <--------------------------
    const unsigned int quarterPoints = num_points / 4;
    __m128 testVector = _mm_set_ps1(2 * M_PI);
    __m128 correctVector = _mm_set_ps1(M_PI);
    __m128 vNormalizeFactor = _mm_set_ps1(invNormalizeFactor);
    __m128 phase;
    __m128 complex1, complex2, iValue, qValue;
    __m128 keepMask;

    for (; number < quarterPoints; number++) {
        // Load IQ data:
        complex1 = _mm_load_ps(complexVectorPtr);
        complexVectorPtr += 4;
        complex2 = _mm_load_ps(complexVectorPtr);
        complexVectorPtr += 4;
        // Deinterleave IQ data:
        iValue = _mm_shuffle_ps(complex1, complex2, _MM_SHUFFLE(2, 0, 2, 0));
        qValue = _mm_shuffle_ps(complex1, complex2, _MM_SHUFFLE(3, 1, 3, 1));
        // Arctan to get phase:
        phase = atan2f4(qValue, iValue);
        // When Q = 0 and I < 0, atan2f4 sucks and returns 2pi vice pi.
        // Compare to 2pi:
        keepMask = _mm_cmpneq_ps(phase, testVector);
        phase = _mm_blendv_ps(correctVector, phase, keepMask);
        // done with above correction.
        phase = _mm_mul_ps(phase, vNormalizeFactor);
        _mm_store_ps((float*)outPtr, phase);
        outPtr += 4;
    }
    number = quarterPoints * 4;
#endif /* LV_HAVE_LIB_SIMDMATH */                                <--------------------------

    for (; number < num_points; number++) {
        const float real = *complexVectorPtr++;
        const float imag = *complexVectorPtr++;
        *outPtr++ = atan2f(imag, real) * invNormalizeFactor;
    }
}
#endif /* LV_HAVE_SSE4_1 */

jdemel · 2023-10-14T08:01:29Z

My search for simdmath.h has led me nowhere so far.
It seems to be part of this commit:
e4015c7
Or even prior. I'd argue it is reasonable to remove it.

jdemel

LGTM. Thanks for your contribution.

jj1bdx · 2023-12-17T04:44:22Z

@Ka-zam and @jdemel
Please take a look at #730 and #731. Your reviews and comments are appreciated.

argilo · 2023-12-17T14:07:17Z

My search for simdmath.h has led me nowhere so far.

I think it might be IBM's "Software Development Kit for Multicore Acceleration". It has the same header name, and includes the functions that VOLK references (powf4, logf4, atan2f4, cosf4, sinf4).

http://ilab.usc.edu/packages/cell-processor/docs/CBE_SIMDmath_API_v2.1.pdf

It looks related to the Cell Broadband Engine, which some GNU Radio folks were working on around 2009:

https://www.researchgate.net/publication/241488194_High-Performance_SDR_GNU_Radio_and_the_IBM_Cell_Broadband_Engine

It probably makes sense to strip all the LV_HAVE_LIB_SIMDMATH out of VOLK at this point. I'll open an issue for it.

new kernels for atan2

Ka-zam added 3 commits October 4, 2023 16:10

new kernels for atan2

0aa0983

Signed-off-by: Magnus Lundmark <[email protected]>

added fabs

ae43184

Signed-off-by: Magnus Lundmark <[email protected]>

minor typo

722af23

Signed-off-by: Magnus Lundmark <[email protected]>

jdemel requested changes Oct 12, 2023

View reviewed changes

jdemel approved these changes Dec 1, 2023

View reviewed changes

jdemel merged commit 13dcc27 into gnuradio:main Dec 1, 2023
32 checks passed

jj1bdx mentioned this pull request Dec 17, 2023

v3.1.0 volk_32fc_s32f_atan2_32f.h avx2 and avx2_fma kernels return NaN for an input element 0+0j #730

Closed

This was referenced Dec 17, 2023

Remove LV_HAVE_LIB_SIMDMATH? #732

Closed

Remove references to simdmath library #735

Merged

Alesha72003 pushed a commit to Alesha72003/volk that referenced this pull request May 15, 2024

Merge pull request gnuradio#636 from Ka-zam/atan2_kernels

975a55b

new kernels for atan2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new kernels for atan2 #636

new kernels for atan2 #636

Ka-zam commented Oct 4, 2023 •

edited

Loading

jdemel left a comment

Ka-zam commented Oct 13, 2023 •

edited

Loading

jdemel commented Oct 14, 2023

jdemel left a comment

jj1bdx commented Dec 17, 2023

argilo commented Dec 17, 2023

new kernels for atan2 #636

new kernels for atan2 #636

Conversation

Ka-zam commented Oct 4, 2023 • edited Loading

jdemel left a comment

Choose a reason for hiding this comment

Ka-zam commented Oct 13, 2023 • edited Loading

jdemel commented Oct 14, 2023

jdemel left a comment

Choose a reason for hiding this comment

jj1bdx commented Dec 17, 2023

argilo commented Dec 17, 2023

Ka-zam commented Oct 4, 2023 •

edited

Loading

Ka-zam commented Oct 13, 2023 •

edited

Loading