Better hmin/hmax algorithms for SSE/AVX2 #510

Use a formulation that automatically produces the same result in all lanes, avoiding a separate broadcast step. The same approach would work with floats in principle, but it's not guaranteed to give the same result in all lanes when NaNs are involved (due to the way MINPS/MAXPS are defined), so leave the float versions alone for now. About 1% encode time reduction encoding a 8192x8192 test texture at 6x6 -thorough on a Ryzen 7950X3D.

Commits on Nov 4, 2024

Merge branch 'main' into better-reduction

solidpixel authored Nov 4, 2024

Configuration menu

View commit details

Copy full SHA for 55138bf

Browse repository at this point

Copy the full SHA

55138bf View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better hmin/hmax algorithms for SSE/AVX2 #510

Better hmin/hmax algorithms for SSE/AVX2 #510

Commits on Nov 1, 2024

Commits on Nov 4, 2024