Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volk_32fc_s32f_atan2_32f: Add NaN tests for avx2 and avx2_fma code #731

Merged
merged 1 commit into from
Jan 7, 2024

Conversation

jj1bdx
Copy link
Contributor

@jj1bdx jj1bdx commented Dec 17, 2023

This adds a NaN test after the division operation of volk_32fc_s32f_atan2_32f avx2 and avx2_fma code.

This patch solves the issue in #730.

Using volk_profile, the processing time will increase by 17% on my environment of Intel NUC NUC10i7FNH with Intel(R) Core(TM) i7-10710U CPU. I believe this is a necessary cost to maintain full compatibility with the past (i.e., VOLK v3.0.0 and previous) implementations.

@jj1bdx
Copy link
Contributor Author

jj1bdx commented Dec 17, 2023

volk_profile benchmark results:

With old v3.1.0 code:

RUN_VOLK_TESTS: volk_32fc_s32f_atan2_32f(131071,1987)
generic completed in 7078.28 ms
polynomial completed in 2318.11 ms
a_avx2_fma completed in 175.141 ms
a_avx2 completed in 175.124 ms
u_avx2_fma completed in 174.965 ms
u_avx2 completed in 176.404 ms
Best aligned arch: u_avx2_fma
Best unaligned arch: u_avx2_fma

With the code proposed in this pull request:

RUN_VOLK_TESTS: volk_32fc_s32f_atan2_32f(131071,1987)
generic completed in 7075.16 ms
polynomial completed in 2343.78 ms
a_avx2_fma completed in 204.169 ms
a_avx2 completed in 204.303 ms
u_avx2_fma completed in 204.389 ms
u_avx2 completed in 204.4 ms
Best aligned arch: a_avx2_fma
Best unaligned arch: u_avx2_fma

Processing time increase rate: 204/175 ~= 1.166

@jdemel
Copy link
Contributor

jdemel commented Dec 17, 2023

According to Wikipedia atan2, one would expect a NaN for 0.0+0.0j. Changing this behavior in VOLK would potentially cause trouble.

Before the new AVX2 and AVX2 FMA kernels, we would basically always use the atan2 function from the standard.

atan2f(0.0, 0.0) == 0.0;

Thus, this fix makes sense to align the ouputs.

@argilo
Copy link
Member

argilo commented Dec 17, 2023

Normally VOLK strives for maximum performance, but I agree that we should avoid returning NaN, because that can be catastrophic in DSP contexts. (For instance, if an IIR filter receives a single NaN sample, then all future outputs become NaN.)

@marcusmueller
Copy link
Member

marcusmueller commented Dec 18, 2023

I must admit I especially don't like the fallback on the generic implementation at the end of loops here. This kernel is used for phase estimations, and minor implementation details (like whether to jump to -pi or +pi for zero x, positive or negative y) makes it hard to debug this and to use it. Could we maybe just for the "leftover" iteration fill a set MM register with the inputs that are available, run the regular SIMD on it, then instead of storing the whole result, just cherry-pick these elements that are actually based on input?

@marcusmueller
Copy link
Member

might be worth checking what sleef does here.

Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This PR fixes situation where a NaN may corrupt a system. We should really avoid that.

The other concerns, e.g. how to handle the tail case, are valid but out of scope.

@jdemel jdemel merged commit ce314df into gnuradio:main Jan 7, 2024
33 checks passed
@jj1bdx
Copy link
Contributor Author

jj1bdx commented Jan 11, 2024

@jdemel Thanks for merging!

@Ka-zam Ka-zam mentioned this pull request Feb 28, 2024
Alesha72003 pushed a commit to Alesha72003/volk that referenced this pull request May 15, 2024
volk_32fc_s32f_atan2_32f: Add NaN tests for avx2 and avx2_fma code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v3.1.0 volk_32fc_s32f_atan2_32f.h avx2 and avx2_fma kernels return NaN for an input element 0+0j
4 participants