Skip to content

[X86][AVX2] X86FixupVectorConstantsPass - performance regression #135998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nurmukhametov opened this issue Apr 16, 2025 · 5 comments
Open

[X86][AVX2] X86FixupVectorConstantsPass - performance regression #135998

nurmukhametov opened this issue Apr 16, 2025 · 5 comments
Assignees

Comments

@nurmukhametov
Copy link
Contributor

Moving ISPC from LLVM 18 to LLVM 20, I have encountered performance regression for a few of our benchmarks for ISPC target avx2 (corresponds to -mcpu=haswell). It seems to be caused by #122601. It is around ~20-25%, as measured on Intel CPUs i9-12900 and 8360Y.

Compiler explorer link: https://godbolt.org/z/qx8s6K6br

@nurmukhametov
Copy link
Contributor Author

@RKSimon, could you take a look? It's not clear to me whether it is expected or not.

@llvmbot
Copy link
Member

llvmbot commented Apr 16, 2025

@llvm/issue-subscribers-backend-x86

Author: Aleksei Nurmukhametov (nurmukhametov)

Moving ISPC from LLVM 18 to LLVM 20, I have encountered performance regression for a few of our benchmarks for ISPC target avx2 (corresponds to `-mcpu=haswell`). It seems to be caused by #122601. It is around ~20-25%, as measured on Intel CPUs i9-12900 and 8360Y.

Compiler explorer link: https://godbolt.org/z/qx8s6K6br

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 16, 2025

AFAICT Haswell and later do have free domain crossing from VPMOVSXBD loads to fp-domain.

What could be happening is increased Port5 pressure, @phoebewang any thoughts?

https://clang.godbolt.org/z/5dTdbP1xG

@phoebewang
Copy link
Contributor

AFAICT Haswell and later do have free domain crossing from VPMOVSXBD loads to fp-domain.

What could be happening is increased Port5 pressure, @phoebewang any thoughts?

https://clang.godbolt.org/z/5dTdbP1xG

Yeah, I think the Port5 is the reason, especially for 256-bit. Note, the binary is running actually on Alderlake, which has 0.3 RT on VMOVAPS compared with 1 on VPMOVSXBD.

@RKSimon RKSimon self-assigned this Apr 17, 2025
@RKSimon
Copy link
Collaborator

RKSimon commented Apr 19, 2025

I'm going to look at making this scheduler model driven like we do for X86FixupInstTuningPass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants