Skip to content

Conversation

@colpane
Copy link
Contributor

@colpane colpane commented Nov 18, 2025

Stress-ng tests were executing on edge iot2050 OS and after 4 days, a kernel crash is observed:

Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: rcu: INFO: rcu_preempt self-detected stall on CPU
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: rcu: 3-...!: (5426 ticks this GP) idle=948c/0/0x1 softirq=0/0 fqs=1
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: (t=5250 jiffies g=42193573 q=28 ncpus=4)
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: rcu: rcu_preempt kthread timer wakeup didn't happen for 5247 jiffies! g42193573 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: rcu: Possible timer handling issue on cpu=3 timer-softirq=11616448
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: rcu: rcu_preempt kthread starved for 5248 jiffies! g42193573 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=3
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: rcu: RCU grace-period kthread stack dump:
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: task:rcu_preempt state:I stack:0 pid:16 ppid:2 flags:0x00000008
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: Call trace:
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: __switch_to+0xdc/0x120
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: __schedule+0x2f8/0x7c4
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: schedule+0x5c/0x10c
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: schedule_timeout+0x8c/0x100
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: rcu_gp_fqs_loop+0x148/0x4b0
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: rcu_gp_kthread+0x13c/0x170
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: kthread+0x124/0x12c
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: ret_from_fork+0x10/0x20
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: rcu: Stack dump where RCU GP kthread last ran:
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.1.134-cip41-rt22 #1
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: Hardware name: SIEMENS AG SIMATIC IOT2050/Unknown Product, BIOS 2023.10-V01.04.04_S01.01.01-0-ga6c5ae8 10/01/2023
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: pc : arch_cpu_idle+0x18/0x2c
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: lr : arch_cpu_idle+0x14/0x2c
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: sp : ffff8000094b3e20
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: x29: ffff8000094b3e20 x28: 0000000000000000 x27: 0000000000000000
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
Nov 15 23:24:31 simatic-iot2050-ied-bc2164f4 kernel: x23: 0000000000000000 x22: ffff0000001a5e80 x21: ffff80000917bac0
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x20: 0000000000000003 x19: ffff80000917b9a0 x18: ffff800009aa3d48
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x17: ffff800009234fc8 x16: 00000000d9f50eeb x15: 00000000dcba71ef
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x14: 00000000c245fa86 x13: 000000000000033e x12: 000000000000033e
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x11: 0000000000000001 x10: 0000000000000b30 x9 : ffff8000094b3d90
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x8 : ffff0000001a6a10 x7 : 00000000000000c0 x6 : ffff80000902a8d8
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x5 : 4000000000000000 x4 : ffff800076b63000 x3 : 0000000000000000
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x2 : 000000000a239489 x1 : ffff00007fb8d8d8 x0 : 00000000000000e0
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: Call trace:
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: arch_cpu_idle+0x18/0x2c
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: default_idle_call+0x30/0x78
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: do_idle+0xd8/0x150
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: cpu_startup_entry+0x38/0x40
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: secondary_start_kernel+0x124/0x150
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: __secondary_switched+0xb0/0xb4
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.1.134-cip41-rt22 #1
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: Hardware name: SIEMENS AG SIMATIC IOT2050/Unknown Product, BIOS 2023.10-V01.04.04_S01.01.01-0-ga6c5ae8 10/01/2023
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: pc : arch_cpu_idle+0x18/0x2c
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: lr : arch_cpu_idle+0x14/0x2c
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: sp : ffff8000094b3e20
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x29: ffff8000094b3e20 x28: 0000000000000000 x27: 0000000000000000
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x23: 0000000000000000 x22: ffff0000001a5e80 x21: ffff80000917bac0
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x20: 0000000000000003 x19: ffff80000917b9a0 x18: ffff800009aa3d48
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x17: ffff800009234fc8 x16: 00000000d9f50eeb x15: 00000000dcba71ef
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x14: 00000000c245fa86 x13: 000000000000033e x12: 000000000000033e
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x11: 0000000000000001 x10: 0000000000000b30 x9 : ffff8000094b3d90
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x8 : ffff0000001a6a10 x7 : 00000000000000c0 x6 : ffff80000902a8d8
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x5 : 4000000000000000 x4 : ffff800076b63000 x3 : 0000000000000000
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: x2 : 000000000a239489 x1 : ffff00007fb8d8d8 x0 : 00000000000000e0
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: Call trace:
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: arch_cpu_idle+0x18/0x2c
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: default_idle_call+0x30/0x78
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: do_idle+0xd8/0x150
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: cpu_startup_entry+0x38/0x40
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: secondary_start_kernel+0x124/0x150
Nov 15 23:24:32 simatic-iot2050-ied-bc2164f4 kernel: __secondary_switched+0xb0/0xb4
Nov 15 23:24:47 simatic-iot2050-ied-bc2164f4 swupdate.sh[4795]: [TRACE] : SWUPDATE running : [start_suricatta] : Suricatta awakened.
Nov 15 23:25:13 simatic-iot2050-ied-bc2164f4 systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
Nov 15 23:25:13 simatic-iot2050-ied-bc2164f4 systemd[1]: systemd-logind.service: Killing process 684 (systemd-logind) with signal SIGABRT.
Nov 15 23:25:34 simatic-iot2050-ied-bc2164f4 kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Nov 15 23:25:34 simatic-iot2050-ied-bc2164f4 kernel: rcu: Tasks blocked on level-0 rcu_node (CPUs 0-3): P4110/3:b..l P4019/9:b..l
Nov 15 23:25:34 simatic-iot2050-ied-bc2164f4 kernel: (detected by 2, t=21005 jiffies, g=42193573, q=214 ncpus=4)

A patch on x86 is already implemented and tested for this kind of problem. Porting it for iot2050.

- If rt-kernel is stressed via a tool like stress-ng,
kernel can crash due to epoll.

Signed-off-by: Enes Colpan <[email protected]>
@colpane
Copy link
Contributor Author

colpane commented Nov 18, 2025

@jan-kiszka @huaqianli @BaochengSu could you please review?

@huaqianli could you also perform stress tests for rt kernel?

@jan-kiszka
Copy link
Collaborator

Log is barely readable... But I know it already 😉

But, yes, we unfortunately will need this. Can you confirm the issue is gone with the patch.

@colpane
Copy link
Contributor Author

colpane commented Nov 18, 2025

Log is barely readable... But I know it already 😉

But, yes, we unfortunately will need this. Can you confirm the issue is gone with the patch.

Sorry for formatting. Switched from code to quote mode. Now it is better :)

We will start a test this week and I will let you know next week. Meanwhile, @huaqianli can build an image and start a test in parallel for double check :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants