A faster NEON vmask any() and all() #483

solidpixel · 2024-07-31T20:46:06Z

This PR changes the NEON implementation of the any() and all() vmask functions to use simple horizontal min/max operations, rather than a test against a specific mask bit pattern, which NEON does not support natively.

Performance impact on Apple M1:

A 350 byte reduction in binary size.
A 0.3% improvement to encode performance (Kodak -medium).
A 5% improvement to decode performance.

Fixes #482

Use faster NEON any/all

04d3bf7

solidpixel added this to the 4.9.0 milestone Jul 31, 2024

solidpixel requested a review from bengaineyarm July 31, 2024 20:51

Update copyright

b5a5c22

bengaineyarm approved these changes Aug 1, 2024

View reviewed changes

solidpixel merged commit 8bc51bc into main Aug 2, 2024
4 checks passed

solidpixel deleted the neon_any_all branch August 2, 2024 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A faster NEON vmask any() and all() #483

A faster NEON vmask any() and all() #483

solidpixel commented Jul 31, 2024 •

edited

Loading

A faster NEON vmask any() and all() #483

A faster NEON vmask any() and all() #483

Conversation

solidpixel commented Jul 31, 2024 • edited Loading

solidpixel commented Jul 31, 2024 •

edited

Loading