You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the p9 error injection trial, there is a probabilistic problem. When executing the error injection instruction, such as 'putscom - c 0x8 0x07010A0D 0x0000000003AF0000', it took 20 minutes to generate a system checkstop and successfully deconfigure the DIMM that had the error inject.
Preliminary judgment shows that after the input control register (000000 07010A0D) was written, the corresponding fault isolation register (000000 07010A00) value was not modified, resulting in no Checkstop.
Directly writing to the corresponding FIR register can trigger a checkstop and successfully deconfigure the DIMM.
May I ask why the FIR value only changed after more than 20 minutes.
The text was updated successfully, but these errors were encountered:
What interface are you using for the putscom? I don't recognize the syntax above.
My understanding of the way 0x07010A0D works is that it places errors into the hardware, but those errors are not surfaced until memory behind that memory controller is actually accessed. Therefore, unless you are explicitly forcing all of mainstore to be accessed (e.g. by running an exercisor of some kind) there will be some non-determinate results.
I also think you might be missing some bits that have to be set to control the injection.
Bits 0:36 : EICR_ADDRESS: Error is injected when read address matches the EICR address, up to fields masked by the EICR region.
0 = dimm select
1:2 = mrank(0:1)
3:5 = srank(0:2)
6:7 = bank_group(0:1)
8:10 = bank(0:2)
11:28 = row(0:17)
29:36 = col(2:9)
Without those bits set there will never be a match to trigger the inject.
Putscom - c 0x0 0x07010A0D 0x00000000003AF0000, this instruction is an error injection for CPU0_C0D0
Putscom - c 0x8 0x07010A0D 0x00000000003AF0000, this instruction is an error injection for CPU1_C0D0
After executing the injection error instruction, the normal situation is to immediately trigger checkstop, and the injection error is successful. But the current situation is that after executing the injection error command, sometimes it takes 20 minutes to trigger the checkstop,and the injection error is successful, but why do we need to wait for 20 minutes?
What do you mean by "the normal situation"? Have you seen other behavior with this specific injection? I still am under the belief that it won't fail until the memory is physically accessed, which is non-deterministic.
In the p9 error injection trial, there is a probabilistic problem. When executing the error injection instruction, such as 'putscom - c 0x8 0x07010A0D 0x0000000003AF0000', it took 20 minutes to generate a system checkstop and successfully deconfigure the DIMM that had the error inject.
Preliminary judgment shows that after the input control register (000000 07010A0D) was written, the corresponding fault isolation register (000000 07010A00) value was not modified, resulting in no Checkstop.
Directly writing to the corresponding FIR register can trigger a checkstop and successfully deconfigure the DIMM.
May I ask why the FIR value only changed after more than 20 minutes.
The text was updated successfully, but these errors were encountered: