Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better hmin/hmax algorithms for SSE/AVX2 #510

Merged
merged 2 commits into from
Nov 4, 2024

Conversation

rygorous
Copy link
Contributor

@rygorous rygorous commented Nov 3, 2024

Use a formulation that automatically produces the same result in all lanes, avoiding a separate broadcast step.

The same approach would work with floats in principle, but it's not guaranteed to give the same result in all lanes when NaNs are involved (due to the way MINPS/MAXPS are defined), so leave the float versions alone for now.

About 1% encode time reduction encoding a 8192x8192 test texture at 6x6 -thorough on a Ryzen 7950X3D.

Use a formulation that automatically produces the same result
in all lanes, avoiding a separate broadcast step.

The same approach would work with floats in principle, but it's
not guaranteed to give the same result in all lanes when NaNs
are involved (due to the way MINPS/MAXPS are defined), so leave
the float versions alone for now.

About 1% encode time reduction encoding a 8192x8192 test texture
at 6x6 -thorough on a Ryzen 7950X3D.
@solidpixel solidpixel self-requested a review November 4, 2024 21:02
@solidpixel solidpixel merged commit 521179c into ARM-software:main Nov 4, 2024
7 checks passed
@rygorous rygorous deleted the better-reduction branch November 9, 2024 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants