-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add vector registers to clobber list to prevent compiler optimization. #5203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
SME based SGEMMDIRECT kernel uses the vector registers (z) and adding clobber list informs compiler not to optimize these registers.
Hi @martin-frbg, |
I came across an issue in the way the library is compiled and used in an application. When the library is compiled with But if the library is compiled with Upon debugging, Our requirement is use a common library for all the targets and if the target supports SME, Is there something missing in the integration? Please let me know if any changes are needed. |
huh, looks like cpu type autodetection for ARMV9SME went missing in dynamic_arm64.c - at least I'm fairly sure we had it already - basically like it's done for ARMV8SVE but using the support_sme1() function |
I added the auto detection for sme by referring to this commit. Still the issue persists :( |
Are you testing with an Apple cpu, or with something else ? |
I am testing on QEMU with SME enabled |
I have now uploaded my current version of dynamic_arm64.c as PR #5222 - I think this should fix the ARMV9SME issue though I have not specifically tested it with QEMU. In any case I think that part of your PR is unrelated to the original task of fixing the over optimization issue in the sgemmdirect kernel ? |
Hi @martin-frbg, Yes, the primary intention of this PR is to fix the compiler optimization. This fix can go as a separate commit. The fix for dynamic detection of the ARMV9SME can continue as part of PR #5222 . Thanks! |
SME based SGEMMDIRECT kernel uses the vector registers (z) and adding clobber list informs compiler not to optimize these registers.