Skip to content

Conversation

@Ka-zam
Copy link
Contributor

@Ka-zam Ka-zam commented Jan 16, 2026

This PR gets rid of the Q_rsqrt implementation and the puppet. Modern CPUs don't use that and it's hacky at best. Fails on negative inputs.

Tolerance is tightened from 1e-02 to 1e-6.

Last kernel before release!

RUN_VOLK_TESTS: volk_32f_invsqrt_32f(vlen=131071, iter=3000, tol=1e-06)
arch                       |         time |     throughput |    max_rel |
---------------------------+--------------+----------------+------------+
generic                    | 1797.7696 ms |    1749.8 MB/s |          - |
a_avx                      |   22.2048 ms |  141673.2 MB/s |    2.4e-07 |
a_avx512f                  |   15.3338 ms |  205156.5 MB/s |    2.1e-07 | *
a_sse                      |   38.4642 ms |   81785.7 MB/s |    2.4e-07 |
u_sse                      |   37.6666 ms |   83517.7 MB/s |    2.4e-07 |
u_avx                      |   19.6460 ms |  160125.7 MB/s |    2.4e-07 |
u_avx512f                  |   15.7183 ms |  200137.9 MB/s |    2.1e-07 | *
Best aligned arch          | a_avx512f (117.24x)
Best unaligned arch        | u_avx512f (114.37x)
--------------------------------------------------------------------------------

RUN_VOLK_TESTS: volk_32f_invsqrt_32f(vlen=131071, iter=597, tol=1e-06)
arch                       |         time |     throughput |    max_rel |
---------------------------+--------------+----------------+------------+
generic                    |  689.8667 ms |     907.4 MB/s |          - |
neon                       |   44.6217 ms |   14029.5 MB/s |    2.0e-07 |
neonv8                     |   43.7900 ms |   14295.9 MB/s |    2.0e-07 | *
Best aligned arch          | neonv8 (15.75x)
Best unaligned arch        | neonv8 (15.75x)
--------------------------------------------------------------------------------

@Ka-zam Ka-zam force-pushed the invsqrt_nr_pr branch 3 times, most recently from 0096220 to 6bd327d Compare January 16, 2026 13:47
Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM.

Just two questions for clarification =D

Signed-off-by: Magnus Lundmark <[email protected]>
@jdemel
Copy link
Contributor

jdemel commented Jan 29, 2026

Thanks for the tests and clarifications =D LGTM!

@jdemel jdemel merged commit 72e5450 into gnuradio:main Jan 29, 2026
61 of 62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants