-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[LoongArch64] Handle LASX/LSX/FP context depending on ISA availability. #118007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* do some clean up.
Please NOTE:
|
Hi @driver1998 , does this solve your problem? |
cc @xen0n for review. You may also need to test the 2K1000/2000/3000 series, those have LSX but no LASX. (I didn't post my patch upstream because I am still waiting for my 2K3000 board to arrive lol) |
Although IIRC HWCAP should be preferred over CPUCFG, since there might be situations where the CPU supports LSX/LASX but the kernel disabled them. see https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6772567 for example. |
Yes, this patch can handle On the clang side, AFAIK |
Thanks, then we should consider similar judgment approach. |
Not sure, but AFAIK a VM running non-LASX kernel on something like a 3A6000 is used for simulating a 2K3000 in distro testing. |
Good point. We can use similar approach taken here for riscv64 #113676 (see |
Thanks very much. This approach is good to detect cpu features to add |
The current version of the change looks good to me. Is it ok to merge it?
This can be done in a follow up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please properly separate logical changes, and say what you did, e.g. "Handle LASX/LSX/FP context depending on ISA availability" in the commit title. "Fix crash on 2K series SoC" should appear in the commit body because it's providing helpful additional context.
@@ -71,6 +83,76 @@ LOCAL_LABEL(Restore_CONTEXT_FLOATING_POINT): | |||
xvld $xr30, $a0, CONTEXT_FPU_OFFSET + 32*30 | |||
xvld $xr31, $a0, CONTEXT_FPU_OFFSET + 32*31 | |||
|
|||
LOCAL_LABEL(Restore_CONTEXT_LSX): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe by not skipping over the following blocks, work is duplicated and important information (upper halves of XRs) is lost, because as it currently stands, the context layout doesn't guarantee a 256-bit XR is stored separately irrespective of LASX/LSX availability.
Example:
- If LASX is present:
CONTEXT_FPU_OFFSET + [0, 15]
: xr0 lower halfCONTEXT_FPU_OFFSET + [16, 31]
: xr0 upper half
- If only LSX is present:
CONTEXT_FPU_OFFSET + [0, 15]
: vr0CONTEXT_FPU_OFFSET + [16, 31]
: vr1
On LoongArch cores with LASX (LSX is implied by LASX so it's always present in this case), xr0
's upper half is NOT xr1
, so with this context layout, information is lost by re-loading in the LSX way. Same for vr0
's upper half vs f0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thanks, I also noticed this yesterday, will update.
Add HWCAP get in minipal_getcpufeatures(). Co-authored-by: driver1998 <[email protected]>
If we had to run on the assumed environment which CPU supports diffdiff --git a/src/coreclr/pal/src/arch/loongarch64/context2.S b/src/coreclr/pal/src/arch/loongarch64/context2.S
index 4e747c767d1..64403508453 100644
--- a/src/coreclr/pal/src/arch/loongarch64/context2.S
+++ b/src/coreclr/pal/src/arch/loongarch64/context2.S
@@ -37,13 +37,16 @@ LOCAL_LABEL(Restore_CONTEXT_FLOATING_POINT):
andi $t1, $r21, (1 << CONTEXT_FLOATING_POINT_BIT)
beqz $t1, LOCAL_LABEL(No_Restore_CONTEXT_FLOATING_POINT)
- ori $r21, $zero, 2
- cpucfg $r21, $r21
- // CPUCFG.2.LASX[bit7]
- andi $t1, $r21, 0x80
+ PROLOG_SAVE_REG_PAIR_INDEXED 22, 1, 24
+ st.d $a0, $sp, 16
+ bl C_FUNC(minipal_getcpufeatures)
+ ori $r21, $a0, 0
+ ld.d $a0, $sp, 16
+ EPILOG_RESTORE_REG_PAIR_INDEXED 22, 1, 24
+
+ andi $t1, $r21, LoongArch64IntrinsicConstants_LASX
bnez $t1, LOCAL_LABEL(Restore_CONTEXT_LASX)
- // CPUCFG.2.LSX[bit6]
- andi $t1, $r21, 0x40
+ andi $t1, $r21, LoongArch64IntrinsicConstants_LSX
bnez $t1, LOCAL_LABEL(Restore_CONTEXT_LSX)
// 64-bits FPU. Not Support LSX/LASX
@@ -300,13 +303,16 @@ LOCAL_LABEL(Done_CONTEXT_INTEGER):
andi $t3, $t1, (1 << CONTEXT_FLOATING_POINT_BIT)
beqz $t3, LOCAL_LABEL(Done_CONTEXT_FLOATING_POINT)
- ori $t1, $zero, 2
- cpucfg $t1, $t1
- // CPUCFG.2.LASX[bit7]
- andi $t3, $t1, 0x80
+ PROLOG_SAVE_REG_PAIR_INDEXED 22, 1, 24
+ st.d $a0, $sp, 16
+ bl C_FUNC(minipal_getcpufeatures)
+ ori $t1, $a0, 0
+ ld.d $a0, $sp, 16
+ EPILOG_RESTORE_REG_PAIR_INDEXED 22, 1, 24
+
+ andi $t3, $t1, LoongArch64IntrinsicConstants_LASX
bnez $t3, LOCAL_LABEL(Store_CONTEXT_LASX)
- // CPUCFG.2.LSX[bit6]
- andi $t3, $t1, 0x40
+ andi $t3, $t1, LoongArch64IntrinsicConstants_LSX
bnez $t3, LOCAL_LABEL(Store_CONTEXT_LSX)
// 64-bits FPU. Not Support LSX/LASX |
Maybe the value can be cached somewhere? I don't expect it will be changed at runtime. Also this can be delayed for later as well, no need to block this PR. |
I just debugged the
Agree. I will also try to find an environment where hardware supports LSX/LASX but the kernel disabled them. |
@dotnet/samsung Could you please take a look? These changes may be related to riscv64. |
@LuckyXu-HF, I agree with @jkotas that we can keep cpufeatures changes out until the time they are actually used in C/C++ code. I see that you have CPU feature detection duplicated in asm and new code in cpufeatures.c is never exercised (i.e. |
Uh oh!
There was an error while loading. Please reload this page.