Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Protection causing reboot loop (IDFGH-14602) #15359

Open
3 tasks done
paulblid opened this issue Feb 7, 2025 · 5 comments
Open
3 tasks done

Memory Protection causing reboot loop (IDFGH-14602) #15359

paulblid opened this issue Feb 7, 2025 · 5 comments
Assignees
Labels
Status: Opened Issue is new Type: Bug bugs in IDF

Comments

@paulblid
Copy link

paulblid commented Feb 7, 2025

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v5.3, v5.4

Espressif SoC revision.

ESP32-S2 rev 1.0

Operating System used.

Windows

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

Custom board

Power Supply used.

External 3.3V

What is the expected behavior?

The code should throw a memory protection fault and then reboot into user code.

What is the actual behavior?

After the first crash (with a memory protection fault), the cpu setup code encounters a Cache disabled exception, which causes a reboot loop.
Turning off memory protection avoids this behavior entirely, but is counterproductive to our app as we would like it to crash if it accesses memory in an unintended way.

Steps to reproduce.

Minimal example code:

#include <stdio.h>
#include <stdint.h>
#include <string.h>

void app_main(void) {
  uint8_t* heap = (uint8_t*)malloc(7);
  // void* bad_addr = (void*)0x39063232;
  void* bad_addr = (void*)0x39000000; // can be any address between 0x20000000-0x39FFFFFF
  memcpy(heap, bad_addr, 7);
}
 

All sdkconfig settings are identical to the official template-app.

Debug Logs.

SPIWP:0xee
mode:DIO, clock div:1
load:0x3ffe5110,len:0x1664
load:0x4004a000,len:0x4
load:�ESP-ROM:esp32s2-rc4-20191025
Build:Oct 25 2019
rst:0x1 (POWERON),boot:0xa (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3ffe5110,len:0x1664
load:0x4004a000,len:0x4
load:0x4004a004,len:0xb18
load:0x4004e000,len:0x3344
entry 0x4004a1c8
I (23) boot: ESP-IDF v5.4 2nd stage bootloader
I (24) boot: compile time Feb  7 2025 21:38:01
I (24) boot: chip revision: v1.0
I (24) boot: efuse block revision: v0.2
I (28) boot.esp32s2: SPI Speed      : 80MHz
I (32) boot.esp32s2: SPI Mode       : DIO
I (35) boot.esp32s2: SPI Flash Size : 2MB
I (39) boot: Enabling RNG early entropy source...
I (43) boot: Partition Table:
I (46) boot: ## Label            Usage          Type ST Offset   Length
I (52) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (59) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (65) boot:  2 factory          factory app      00 00 00010000 00100000
I (72) boot: End of partition table
I (75) esp_image: segment 0: paddr=00010020 vaddr=3f000020 size=092d8h ( 37592) map
I (90) esp_image: segment 1: paddr=00019300 vaddr=3ff9e010 size=0001ch (    28) load
I (90) esp_image: segment 2: paddr=00019324 vaddr=3ffbf9e0 size=01c30h (  7216) load
I (99) esp_image: segment 3: paddr=0001af5c vaddr=40024000 size=050bch ( 20668) load
I (110) esp_image: segment 4: paddr=00020020 vaddr=40080020 size=14058h ( 82008) map
I (129) esp_image: segment 5: paddr=00034080 vaddr=400290bc size=0691ch ( 26908) load
I (141) boot: Loaded app from partition at offset 0x10000
I (141) boot: Disabling RNG early entropy source...
I (150) cache: Instruction cache        : size 8KB, 4Ways, cache line size 32Byte
I (151) cache: Data cache               : size 8KB, 4Ways, cache line size 32Byte
I (166) cpu_start: Unicore app
I (173) cpu_start: Pro cpu start user code
I (173) cpu_start: cpu freq: 160000000 Hz
I (173) app_init: Application information:
I (177) app_init: Project name:     memprot-fault
I (182) app_init: App version:      1
I (186) app_init: Compile time:     Feb  7 2025 21:37:48
I (192) app_init: ELF file SHA256:  a39459dc2...
I (198) app_init: ESP-IDF:          v5.4
I (202) efuse_init: Min chip rev:     v0.0
I (207) efuse_init: Max chip rev:     v1.99 
I (212) efuse_init: Chip rev:         v1.0
I (217) heap_init: Initializing. RAM available for dynamic allocation:
I (224) heap_init: At 3FFC1E28 len 0003A1D8 (232 KiB): RAM
I (230) heap_init: At 3FFFC000 len 00003A10 (14 KiB): RAM
I (236) heap_init: At 3FF9E02C len 00001FBC (7 KiB): RTCRAM
I (243) spi_flash: detected chip: winbond
I (247) spi_flash: flash io: dio
W (251) spi_flash: Detected size(32768k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
I (265) main_task: Started on CPU0
I (265) main_task: Calling app_main()
Guru Meditation Error: Core  0 panic'ed (Memory protection fault). 
Unknown operation at address 0xffffffff not permitted (INVALID_MEM_TYPE)

Core  0 register dump:
PC      : 0x4001abfa  PS      : 0x00060a34  A0      : 0x80086dfc  A1      : 0x3ffc3740  
--- 0x4001abfa: memcpy in ROM

A2      : 0x3ffc4164  A3      : 0x39000000  A4      : 0x00000007  A5      : 0x3ffc4164
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x3ffc3710
A10     : 0x3ffc4164  A11     : 0x3ffc3750  A12     : 0x3ffc3730  A13     : 0x0000000c
A14     : 0x3ffc1cb8  A15     : 0x00000007  SAR     : 0x00000015  EXCCAUSE: 0x00000007  
EXCVADDR: 0x00000000  LBEG    : 0x3ffc3730  LEND    : 0x0000000c  LCOUNT  : 0x400250f0
--- 0x400250f0: xt_highint4 at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_system/port/soc/esp32s2/highint_hdl.S:52



Backtrace: 0x4001abf7:0x3ffc3740 0x40086df9:0x3ffc3750 0x40093a64:0x3ffc3770 0x40029435:0x3ffc37a0
--- 0x4001abf7: memcpy in ROM
0x40086df9: app_main at E:/esp32s2-memprot/memprot-fault/main/main.c:9
0x40093a64: main_task at E:/espressif/v5.4.0/v5.4/esp-idf/components/freertos/app_startup.c:208
0x40029435: vPortTaskWrapper at E:/espressif/v5.4.0/v5.4/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:139





ELF file SHA256: a39459dc2

Rebooting...
ESP-ROM:esp32s2-rc4-20191025
Build:Oct 25 2019
rst:0xc (RTC_SW_CPU_RST),boot:0xa (SPI_FAST_FLASH_BOOT)
Saved PC:0x400252be
--- 0x400252be: esp_restart_noos at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_system/port/soc/esp32s2/system_internal.c:119

SPIWP:0xee
mode:DIO, clock div:1
load:0x3ffe5110,len:0x1664
load:0x4004a000,len:0x4
load:0x4004a004,len:0xb18
load:0x4004e000,len:0x3344
entry 0x4004a1c8
I (26) boot: ESP-IDF v5.4 2nd stage bootloader
I (26) boot: compile time Feb  7 2025 21:38:01
I (26) boot: chip revision: v1.0
I (26) boot: efuse block revision: v0.2
I (30) boot.esp32s2: SPI Speed      : 80MHz
I (34) boot.esp32s2: SPI Mode       : DIO
I (38) boot.esp32s2: SPI Flash Size : 2MB
I (41) boot: Enabling RNG early entropy source...
I (46) boot: Partition Table:
I (49) boot: ## Label            Usage          Type ST Offset   Length
I (55) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (61) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (68) boot:  2 factory          factory app      00 00 00010000 00100000
I (74) boot: End of partition table
I (78) esp_image: segment 0: paddr=00010020 vaddr=3f000020 size=092d8h ( 37592) map
I (93) esp_image: segment 1: paddr=00019300 vaddr=3ff9e010 size=0001ch (    28) load
I (93) esp_image: segment 2: paddr=00019324 vaddr=3ffbf9e0 size=01c30h (  7216) load
I (102) esp_image: segment 3: paddr=0001af5c vaddr=40024000 size=050bch ( 20668) load
I (113) esp_image: segment 4: paddr=00020020 vaddr=40080020 size=14058h ( 82008) map
I (131) esp_image: segment 5: paddr=00034080 vaddr=400290bc size=0691ch ( 26908) load
I (143) boot: Loaded app from partition at offset 0x10000
I (143) boot: Disabling RNG early entropy source...
I (153) cache: Instruction cache        : size 8KB, 4Ways, cache line size 32Byte
I (153) cache: Data cache               : size 8KB, 4Ways, cache line size 32Byte
I (157) cpu_start: Unicore app
Guru Meditation Error: Core  0 panic'ed (Cache disabled but cached memory region accessed).

Core  0 register dump:
PC      : 0x4002f9a5  PS      : 0x00060d34  A0      : 0x80025c3a  A1      : 0x3fffe260  
--- 0x4002f9a5: xt_ints_on at E:/espressif/v5.4.0/v5.4/esp-idf/components/xtensa/xtensa_intr_asm.S:179

A2      : 0x02000000  A3      : 0x00000000  A4      : 0x02000000  A5      : 0x00000003
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x00000014  A9      : 0x00000001
A10     : 0x00000028  A11     : 0x00000001  A12     : 0x00000028  A13     : 0x00000003
A14     : 0x00000000  A15     : 0x00000004  SAR     : 0x00000007  EXCCAUSE: 0x00000007
EXCVADDR: 0x00000000  LBEG    : 0x00000028  LEND    : 0x00000003  LCOUNT  : 0x400250f0
--- 0x400250f0: xt_highint4 at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_system/port/soc/esp32s2/highint_hdl.S:52



Backtrace: 0x4002f9a2:0x3fffe260 0x40025c37:0x3fffe270 0x40083779:0x3fffe290 0x40083e18:0x3fffe2b0 0x40024fc5:0x3fffe2f0 0x4004f555:0x3fffe330 0x4004f969:0x3fffe370 0x4004a231:0x3fffe4a0 0x40011371:0x3fffe560 0x4000f788:0x3fffe6f0 0x400072a5:0x3fffe710
--- 0x4002f9a2: xt_ints_on at E:/espressif/v5.4.0/v5.4/esp-idf/components/xtensa/xtensa_intr_asm.S:178
0x40025c37: esp_cpu_intr_enable at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_hw_support/include/esp_cpu.h:388
 (inlined by) esp_intr_enable_source at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_hw_support/intr_alloc.c:939
0x40083779: esp_memprot_intr_init at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_hw_support/port/esp32s2/memprot.c:144
0x40083e18: esp_memprot_set_prot at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_hw_support/port/esp32s2/memprot.c:819 (discriminator 1)
0x40024fc5: call_start_cpu0 at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_system/port/cpu_start.c:767
0x40011371: ets_run_flash_bootloader in ROM
0x4000f788: main in ROM
0x400072a5: _start in ROM





ELF file SHA256:

Rebooting...
ESP-ROM:esp32s2-rc4-20191025
Build:Oct 25 2019
rst:0xc (RTC_SW_CPU_RST),boot:0xa (SPI_FAST_FLASH_BOOT)
Saved PC:0x400252be
--- 0x400252be: esp_restart_noos at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_system/port/soc/esp32s2/system_internal.c:119

SPIWP:0xee
mode:DIO, clock div:1
load:0x3ffe5110,len:0x1664
load:0x4004a000,len:0x4
load:0x4004a004,len:0xb18
load:0x4004e000,len:0x3344
entry 0x4004a1c8
I (26) boot: ESP-IDF v5.4 2nd stage bootloader
I (26) boot: compile time Feb  7 2025 21:38:01
I (26) boot: chip revision: v1.0
I (26) boot: efuse block revision: v0.2
I (30) boot.esp32s2: SPI Speed      : 80MHz
I (34) boot.esp32s2: SPI Mode       : DIO
I (38) boot.esp32s2: SPI Flash Size : 2MB
I (41) boot: Enabling RNG early entropy source...
I (46) boot: Partition Table:
I (49) boot: ## Label            Usage          Type ST Offset   Length
I (55) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (61) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (68) boot:  2 factory          factory app      00 00 00010000 00100000
I (74) boot: End of partition table
I (78) esp_image: segment 0: paddr=00010020 vaddr=3f000020 size=092d8h ( 37592) map
I (93) esp_image: segment 1: paddr=00019300 vaddr=3ff9e010 size=0001ch (    28) load
I (93) esp_image: segment 2: paddr=00019324 vaddr=3ffbf9e0 size=01c30h (  7216) load
I (102) esp_image: segment 3: paddr=0001af5c vaddr=40024000 size=050bch ( 20668) load
I (113) esp_image: segment 4: paddr=00020020 vaddr=40080020 size=14058h ( 82008) map
I (131) esp_image: segment 5: paddr=00034080 vaddr=400290bc size=0691ch ( 26908) load
I (143) boot: Loaded app from partition at offset 0x10000
I (143) boot: Disabling RNG early entropy source...
I (153) cache: Instruction cache        : size 8KB, 4Ways, cache line size 32Byte
I (153) cache: Data cache               : size 8KB, 4Ways, cache line size 32Byte
I (157) cpu_start: Unicore app
Guru Meditation Error: Core  0 panic'ed (Cache disabled but cached memory region accessed).

Core  0 register dump:
PC      : 0x4002f9a5  PS      : 0x00060d34  A0      : 0x80025c3a  A1      : 0x3fffe260
--- 0x4002f9a5: xt_ints_on at E:/espressif/v5.4.0/v5.4/esp-idf/components/xtensa/xtensa_intr_asm.S:179

A2      : 0x02000000  A3      : 0x00000000  A4      : 0x02000000  A5      : 0x00000003
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x00000014  A9      : 0x00000001
A10     : 0x00000028  A11     : 0x00000001  A12     : 0x00000028  A13     : 0x00000003
A14     : 0x00000000  A15     : 0x00000004  SAR     : 0x00000007  EXCCAUSE: 0x00000007
EXCVADDR: 0x00000000  LBEG    : 0x00000028  LEND    : 0x00000003  LCOUNT  : 0x400250f0
--- 0x400250f0: xt_highint4 at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_system/port/soc/esp32s2/highint_hdl.S:52



Backtrace: 0x4002f9a2:0x3fffe260 0x40025c37:0x3fffe270 0x40083779:0x3fffe290 0x40083e18:0x3fffe2b0 0x40024fc5:0x3fffe2f0 0x4004f555:0x3fffe330 0x4004f969:0x3fffe370 0x4004a231:0x3fffe4a0 0x40011371:0x3fffe560 0x4000f788:0x3fffe6f0 0x400072a5:0x3fffe710
--- 0x4002f9a2: xt_ints_on at E:/espressif/v5.4.0/v5.4/esp-idf/components/xtensa/xtensa_intr_asm.S:178
0x40025c37: esp_cpu_intr_enable at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_hw_support/include/esp_cpu.h:388
 (inlined by) esp_intr_enable_source at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_hw_support/intr_alloc.c:939
0x40083779: esp_memprot_intr_init at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_hw_support/port/esp32s2/memprot.c:144
0x40083e18: esp_memprot_set_prot at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_hw_support/port/esp32s2/memprot.c:819 (discriminator 1)
0x40024fc5: call_start_cpu0 at E:/espressif/v5.4.0/v5.4/esp-idf/components/esp_system/port/cpu_start.c:767
0x40011371: ets_run_flash_bootloader in ROM
0x4000f788: main in ROM
0x400072a5: _start in ROM

More Information.

  • There seems to be a related issue 2 years ago but it has remained open.
  • Putting the panic handler in ISR does not help.
  • Changing coredump settings does not help (coredump config is also showing up as corrupted if coredump is enabled)
  • The specific line causing the Disabled cache error seems to be memprot interrupt enabling code esp_memprot_intr_init.
  • My current (probably wrong) theory is that there is some state left over from the crash, or some interrupt that is not cleared, and upon it being enabled by the memprot code, it calls into that interrupt handler, which is probably not located in IRAM, but I did not have enough time to go down this rabbit hole any further.
@paulblid paulblid added the Type: Bug bugs in IDF label Feb 7, 2025
@github-actions github-actions bot changed the title Memory Protection causing reboot loop Memory Protection causing reboot loop (IDFGH-14602) Feb 7, 2025
@espressif-bot espressif-bot added the Status: Opened Issue is new label Feb 7, 2025
@o-marshmallow
Copy link
Collaborator

Hello @paulblid ,

Thanks for reporting this issue, I was able to reproduce it on my side, I am going to investigate it.

@paulblid
Copy link
Author

Hello,

I asked my colleague @lucic71 to do some digging around, and we have found that there is some leftover state from a memprot crash.

We hooked the panic handler to dump memprot registers during an unrelated crash (nullptr access) and during the memprot crash from above, we noticed the following differences:

Image

PMS_PRO_AHB_4_REG indicates the address, which is as expected
PMS_PRO_AHB_3_REG indicates an active interrupt (bit 3, 0x6), and this value persists across panics, it is never cleared.

We then tried wrapping the panic handler (gcc --wrap option) and inserted the following instruction:

void __real_esp_panic_handler(void*);
void __wrap_esp_panic_handler (void* info) {
  esp_memprot_clear_intr(MEMPROT_PERI2_RTCSLOW_0);
  __real_esp_panic_handler(info);
}

This fixed the crash loop, but it's only a proof of concept.
We are now trying to wrap the esp_memprot_set_prot function to clear any pending memprot interrupt, but ideally that would be patched in ESP-IDF code, if we manage to get it running I will post the results later.

Hope this helps investigating the issue further.

@o-marshmallow
Copy link
Collaborator

Hello @paulblid ,

Sorry for not giving an update, I have already opened an internal PR to fix this issue, it is currently under review.

I confirm that the source of the issue is that the memory protection interrupt state is not cleaned before re-enabling the interrupts on boot, hence the boot loop.

@paulblid
Copy link
Author

Thank you for the update.
Will this fix be backported to any older ESP-IDF versions, or is this just v5.4+?
Also is there any estimate when the fix might be released?

For anyone else waiting for the official fix, this is our workaround for now:

memprot_handler.cpp

/* IDFGH-14602 */
extern "C"
{
  mem_type_prot_t get_active_intr_memtype()
  {
    if (DPORT_GET_PERI_REG_MASK(DPORT_PMS_PRO_IRAM0_4_REG, DPORT_PMS_PRO_IRAM0_ILG_INTR) > 0) {
      return MEMPROT_IRAM0_SRAM;
    } else if (DPORT_GET_PERI_REG_MASK(DPORT_PMS_PRO_DRAM0_3_REG, DPORT_PMS_PRO_DRAM0_ILG_INTR) > 0) {
      return MEMPROT_DRAM0_SRAM;
    } else if (DPORT_GET_PERI_REG_MASK(DPORT_PMS_PRO_DPORT_6_REG, DPORT_PMS_PRO_DPORT_ILG_INTR) > 0) {
        return MEMPROT_PERI1_RTCSLOW;
    } else if (DPORT_GET_PERI_REG_MASK(DPORT_PMS_PRO_AHB_3_REG, DPORT_PMS_PRO_AHB_ILG_INTR) > 0) {
      return MEMPROT_PERI2_RTCSLOW_0;
    }
    return MEMPROT_NONE;
  }
  
  esp_err_t __real_esp_memprot_set_prot(bool invoke_panic_handler, bool lock_feature, uint32_t *mem_type_mask);
  esp_err_t __wrap_esp_memprot_set_prot (bool invoke_panic_handler, bool lock_feature, uint32_t *mem_type_mask) {
    esp_memprot_clear_intr(get_active_intr_memtype());
    return __real_esp_memprot_set_prot(invoke_panic_handler, lock_feature, mem_type_mask);
  }
}

CMakeLists.txt

idf_build_set_property(LINK_OPTIONS "-Wl,--wrap=esp_memprot_set_prot" APPEND)

Additionally, I think there is a bug in the memprot implementation for esp32s2, specifically this line:

if (memprot_ll_dram0_is_assoc_intr()) {

Shouldn't this be a call to memprot_ll_peri1_is_assoc_intr instead of memprot_ll_dram0_is_assoc_intr considering it's referring to peri1 not dram0?

Paul.

@o-marshmallow
Copy link
Collaborator

This fix will be backported to the older versions that are still maintained

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Opened Issue is new Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

3 participants