Better hmin/hmax algorithms for SSE/AVX2 #510

rygorous · 2024-11-03T19:21:42Z

Use a formulation that automatically produces the same result in all lanes, avoiding a separate broadcast step.

The same approach would work with floats in principle, but it's not guaranteed to give the same result in all lanes when NaNs are involved (due to the way MINPS/MAXPS are defined), so leave the float versions alone for now.

About 1% encode time reduction encoding a 8192x8192 test texture at 6x6 -thorough on a Ryzen 7950X3D.

Use a formulation that automatically produces the same result in all lanes, avoiding a separate broadcast step. The same approach would work with floats in principle, but it's not guaranteed to give the same result in all lanes when NaNs are involved (due to the way MINPS/MAXPS are defined), so leave the float versions alone for now. About 1% encode time reduction encoding a 8192x8192 test texture at 6x6 -thorough on a Ryzen 7950X3D.

solidpixel self-requested a review November 4, 2024 21:02

Merge branch 'main' into better-reduction

55138bf

solidpixel approved these changes Nov 4, 2024

View reviewed changes

solidpixel merged commit 521179c into ARM-software:main Nov 4, 2024
7 checks passed

rygorous deleted the better-reduction branch November 9, 2024 00:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better hmin/hmax algorithms for SSE/AVX2 #510

Better hmin/hmax algorithms for SSE/AVX2 #510

rygorous commented Nov 3, 2024

Better hmin/hmax algorithms for SSE/AVX2 #510

Better hmin/hmax algorithms for SSE/AVX2 #510

Conversation

rygorous commented Nov 3, 2024