Skip to content

AArch64: Use better block COPY8 #4414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2025
Merged

Conversation

arpadpanyik-arm
Copy link
Contributor

The vector copy is only necessary for 16-byte blocks on AArch64.

Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with -O3 -march=armv8.2-a+sve2:

                 Clang-19  Clang-20    GCC-14    GCC-15
 1#silesia.tar:   +0.316%   +0.865%   +0.025%   +0.096%
 2#silesia.tar:   +0.689%   +1.374%   +0.027%   +0.065%
 3#silesia.tar:   +0.811%   +1.654%   +0.034%   +0.033%
 4#silesia.tar:   +0.912%   +1.755%   +0.027%   +0.042%
 5#silesia.tar:   +0.995%   +1.826%   +0.062%   +0.094%
 6#silesia.tar:   +0.976%   +1.777%   +0.065%   +0.104%
 7#silesia.tar:   +0.910%   +1.738%   +0.077%   +0.110%

No measurable change in compression performance was observed.

The vector copy is only necessary for 16-byte blocks on AArch64.

Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-19  Clang-20    GCC-14    GCC-15
 1#silesia.tar:   +0.316%   +0.865%   +0.025%   +0.096%
 2#silesia.tar:   +0.689%   +1.374%   +0.027%   +0.065%
 3#silesia.tar:   +0.811%   +1.654%   +0.034%   +0.033%
 4#silesia.tar:   +0.912%   +1.755%   +0.027%   +0.042%
 5#silesia.tar:   +0.995%   +1.826%   +0.062%   +0.094%
 6#silesia.tar:   +0.976%   +1.777%   +0.065%   +0.104%
 7#silesia.tar:   +0.910%   +1.738%   +0.077%   +0.110%
@Cyan4973
Copy link
Contributor

Performance on aarch64 (M1 Pro) is within noise level between dev and this patch,
but it's also not worse, and the explanation makes sense, so it's fine.

@Cyan4973 Cyan4973 merged commit 1dbc2e0 into facebook:dev Jun 25, 2025
103 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants