Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malloc regions #584

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ourairquality
Copy link
Contributor

Add support for using the spare iram and optionally part of the icache as part of the malloc heap. This can add roughly 20k to the malloc heap, but with conditions.

To use the icache for the malloc heap, compile with -DESP8266_ENABLE_LOW_ICACHE There is obviously a trade off to be made, but if desperate for memory then this might be an option.

Newlib has been extended to be able to allocate from two separate regions. This is controlled per-thread by calling set_malloc_regions(). There are four options: MALLOC_MASK_DRAM and MALLOC_MASK_IRAM to allocate from only the dram or iram respectively, and MALLOC_MASK_PREFER_DRAM and MALLOC_MASK_PREFER_IRAM to prefer dram or iram respectively but to use the other region if necessary. The function set_malloc_regions() returns the prior option, so this can be used to make a change for some dynamic code block, for example if calls to malloc in a dynamic context need to be in a particular region. This is used in lwip and FreeRTOS where necessary to ensure buffers are in dram, so those are some example. The wificfg example has been updated to report both the dram and iram usage. The amount of free memory reported by mallinfo() also depends on the option set by set_malloc_regions() so if allocating to only dram or iram then it returns only the free memory in that region, and this is also thread local, and there is an example in lwip where it checks the available dram.

There are some hacks involved, particularly on the newlib side, it is not pretty with hard coded heuristics and addresses specific to esp-open-rtos, but it might be some time before that is cleaned up and it should not be broken, just not pretty. For example there is a heuristic to preference iram if the dram is getting low to try to ensure that there is room to allocate the lwip pbufs as the TX buffers appear to need to be in dram.

The defaults should be a good start, preferring dram but using iram when necessary.

Been testing it for a little while now, and it seems to be holding up. Testing and feedback welcomed.

@ourairquality
Copy link
Contributor Author

Unfortunately this is still not quite right. Reports of wifi beacon timeouts when using the lower icache for the malloc heap. It appears to actually be the hit losing the extra icache, as the problem is reproducible even when the lower icache has no data in it. Still there is a small gain using the unused iram.

@Zaltora
Copy link
Contributor

Zaltora commented Mar 1, 2018

I am in for test ! Like you know i am desperate for more RAM. My app is hungry. My recent work are specific to GUI system called lvgl, a project open source to design advanced GUI for many different screens. I want add it as component to esp-open-rtos.
It is work already. i use it with ssd1306.
With your update, i will specify to lvgl to use I-RAM instead of dram.

@Zaltora
Copy link
Contributor

Zaltora commented Mar 1, 2018

why beacon is impacted by this change ?
Maybe some wifi primitive function need to be put in IRAM to avoid to load it from flash than can be the reason of the extra time ?

@ourairquality
Copy link
Contributor Author

@Zaltora Are you seeing the beacon timeout too? Perhaps moving some code to iram would solve this, but I have not been able to nail it down. I suspect it has something to do with the sdk libpp, and there is a NMI Wifi interrupt handler there that might be timing critical. I don't see the beacon timeout when just using the iram and not the icache. It would be great to unlock that extra 16k, and it seem so close.

@Zaltora
Copy link
Contributor

Zaltora commented Mar 2, 2018

I didn't remember to get problem when i manually disable ICACHE bank 0 with #510. I will do a text to confirm.
I don't no how to enable icache with your pull request. i add this in my makefile:
EXTRA_C_CXX_FLAGS = -DESP8266_ENABLE_LOW_ICACHE
but seem do not work. what is the good way ?

First remark, i got more HEAP than with official RTOS (global stream update ?) I got ~2k more. That is good.

@ourairquality
Copy link
Contributor Author

It needs to be given 0 to disable the low icache, and that should report an extra 16k. We can rework the options and interface based on feedback, if it proves useful.

EXTRA_C_CXX_FLAGS = -DESP8266_ENABLE_LOW_ICACHE=0

@Zaltora
Copy link
Contributor

Zaltora commented Mar 2, 2018

Ok that my output with icache:

MAIN: SDK version:0.9.9

SDK versionMAIN: DPORT DPORT0                3FF00000   00000001
MAIN: DPORT UNKNOW                3FF00004      00000005
MAIN: DPORT UNKNOW                3FF00008      0000080F
MAIN: DPORT SPI_READY             3FF0000C      04000102
MAIN: DPORT UNKNOW                3FF00010      00000000
MAIN: DPORT CPU_CLOCK             3FF00014      00000000
MAIN: DPORT CLOCKGATE_WATCHDOG    3FF00018      FFFF00FF
MAIN: DPORT UNKNOW                3FF0001C      00000000
MAIN: DPORT SPI_INT_STATUS        3FF00020      00000010
MAIN: DPORT SPI_CACHE_RAM         3FF00024      0000001A
MAIN: DPORT PERI_IO               3FF00028      00000000
MAIN: DPORT SLC_TX_DESC_DEBUG     3FF0002C      00000000
MAIN: DPORT UNKNOW                3FF00030      00004040
MAIN: DPORT UNKNOW                3FF00034      00000000
MAIN: DPORT UNKNOW                3FF00038      00000041
MAIN: DPORT UNKNOW                3FF0003C      00000000
MAIN: DPORT UNKNOW                3FF00040      00000000
MAIN: DPORT UNKNOW                3FF00044      00000000
MAIN: DPORT UNKNOW                3FF00048      00000000
MAIN: DPORT UNKNOW                3FF0004C      00000000
MAIN: DPORT OTP_MAC0              3FF00050      F67E0000
MAIN: DPORT OTP_MAC1              3FF00054      020002D9
MAIN: DPORT OTP_CHIPID            3FF00058      AC00B000
MAIN: DPORT OTP_MAC2              3FF0005C      00DC4F22
MAIN: Heap: 53636
MAIN: void size: 4
MAIN: template : 11111111,11111111
data  : 0x3ffe8000 ~ 0x3ffe8964, len: 2404
rodata: 0x3ffe8968 ~ 0x3ffe91f8, len: 2192
bss   : 0x3ffe91f8 ~ 0x3fff1638, len: 33856
heap  : 0x3fff1638 ~ 0x40000000, len: 59848
MAIN: var addr: 3FFFBFC4
MAIN: var2 addr: 3FFFBFC0
mode : softAP(de:4f:22:02:d9:f6)
add if1
bcn 100

without icache:

MAIN: SDK version:0.9.9
SDK versionMAIN: DPORT DPORT0                3FF00000   00000001
MAIN: DPORT UNKNOW                3FF00004      00000005
MAIN: DPORT UNKNOW                3FF00008      0000080F
MAIN: DPORT SPI_READY             3FF0000C      00000102
MAIN: DPORT UNKNOW                3FF00010      00000000
MAIN: DPORT CPU_CLOCK             3FF00014      00000000
MAIN: DPORT CLOCKGATE_WATCHDOG    3FF00018      FFFF00FF
MAIN: DPORT UNKNOW                3FF0001C      00000000
MAIN: DPORT SPI_INT_STATUS        3FF00020      00000010
MAIN: DPORT SPI_CACHE_RAM         3FF00024      0000000A
MAIN: DPORT PERI_IO               3FF00028      00000000
MAIN: DPORT SLC_TX_DESC_DEBUG     3FF0002C      00000000
MAIN: DPORT UNKNOW                3FF00030      00004040
MAIN: DPORT UNKNOW                3FF00034      00000000
MAIN: DPORT UNKNOW                3FF00038      00000041
MAIN: DPORT UNKNOW                3FF0003C      00000000
MAIN: DPORT UNKNOW                3FF00040      00000000
MAIN: DPORT UNKNOW                3FF00044      00000000
MAIN: DPORT UNKNOW                3FF00048      00000000
MAIN: DPORT UNKNOW                3FF0004C      00000000
MAIN: DPORT OTP_MAC0              3FF00050      F67E0000
MAIN: DPORT OTP_MAC1              3FF00054      020002D9
MAIN: DPORT OTP_CHIPID            3FF00058      AC00B000
MAIN: DPORT OTP_MAC2              3FF0005C      00DC4F22
MAIN: Heap: 70020
MAIN: void size: 4
MAIN: template : 11111111,11111111
data  : 0x3ffe8000 ~ 0x3ffe8964, len: 2404
rodata: 0x3ffe8968 ~ 0x3ffe91f8, len: 2192
bss   : 0x3ffe91f8 ~ 0x3fff1638, len: 33856
heap  : 0x3fff1638 ~ 0x40000000, len: 59848
MAIN: var addr: 3FFFBFC4
MAIN: var2 addr: 3FFFBFC0
mode : softAP(de:4f:22:02:d9:f6)
add if1
bcn 100

I didn't see visible beacon problem for now. I will try with my app.
global stream rework gain is bigger than expected : 3,5k well well !!

@Zaltora
Copy link
Contributor

Zaltora commented Mar 2, 2018

Ok tested my final app. no problem on serial output. Trying to connect with Wifi, no problem detected
It is work well too. i got 30k heap free instead of 10K before your update

What problem you got with beacon? you got a message on serial output ?

@Zaltora
Copy link
Contributor

Zaltora commented Mar 2, 2018

the problem is reproducible even when the lower icache has no data in it

i didn't use icache right now. Just enable the option in makefile and it is work. I guess by default, DRAM is used first.

@ourairquality
Copy link
Contributor Author

Yes, the default is to use the DRAM first, but that can be changed. I've been able to reproduce wifi problems. Still not sure what the cause is. It might be that the app is overloading the icache and performance is falling off a cliff, or there might be bugs in there somewhere. Shall keep exploring. If you could give it a try and report any problems that might give more clues.

@Zaltora
Copy link
Contributor

Zaltora commented Mar 2, 2018

how work the system with: set_malloc_regions().
i need call it before call functions than will use dynamic memory ?
That not a problem with concurrent system ?

If i do this, only this task will use IRAM ?

Task1()
{
   set_malloc_regions(MALLOC_MASK_PREFER_IRAM);
   while(1)
   {
     //Dynamic memory functions here
   }
}

@ourairquality
Copy link
Contributor Author

Yes that usage of set_malloc_regions() looks good. It is thread local, so each thread can have a different setting and each thread can change the setting for a dynamic code context without affecting other threads. The function returns the prior setting to allow restoring the setting on exit from a dynamic code context.

@Zaltora
Copy link
Contributor

Zaltora commented Mar 2, 2018

Wooo, when i define a malloc regions in a task, it like define a new propriety to the task. Nice system. I am really curious how it is work when coding.

i put set_malloc_regions(MALLOC_MASK_PREFER_IRAM); in my user_init() and screen task to see what happens. No Wifi problem detected yet.
What can i do to force the wifi problem problem ?.
I manage socket with lwip in a particular way (non blocking API). the wifi is set in AP mode.

@ourairquality
Copy link
Contributor Author

ourairquality commented Mar 2, 2018

The setting is stored in the newlib reent structure which is swapped when freertos changes threads. There might have been other ways, and the newlib code is not pretty, but that seems a detail that can be cleaned up later if this works.

The http_get_mbedtls example gives a beacon timeout, but it might have a large code footprint and it works the cpu hard. I tried an example loop just accessing the dram hard and see problems too, so there might be deeper problems when using only 16k of icache. Might also be seeing problems with the spi access code, when an nmi wifi interrupt occurs. Let me keep working though things to try to narrow them down. I have not spotted any data corruption issues though. It might not be a show stopper for some apps if the wifi loses some packets when the cpu is working hard, but that needs to be understand, need to be sure there are not other bugs.

@ourairquality
Copy link
Contributor Author

Testing of the flash code with buffers in iram or icache has shown no problems. Tested a range of small buffers sizes, buffer offsets, flash offsets, all worked ok. Looking at the source code also suggests it should not make a difference if the source and target buffers are in dram or iram, except for performance, because the code uses regular load and store instructions to copy between the flash i/o buffers.

I have no more clues about wifi stability, not even sure if there is an issue. I made some of my code more robust to unreliable wifi and it seems ok. Was able to get the mbedt_get_http example running using larger buffers and the extra icache memory, and also the aws example using larger buffers.

@Zaltora
Copy link
Contributor

Zaltora commented Mar 6, 2018

I was thinking about the wifi problem. If the example you tested make cpu work hard like "http_get_mbedtls". Maybe the problem is related to this because use IRAM intensifies the use of cpu. What effect got to overclock the cpu to 160MHz if it is not the case ?
the problem can be priority on interrupt which is not respected ? (even if cpu is at 100% ) (lock system was reworked for libc ( for timer1 interupt ))

I got some library i want they use other RAM than the current task use (e.g: WIFI task use DRAM to avoid problem related to use IRAM like tx buffer. The following code is good ? :

Task1()
{
   set_malloc_regions(MALLOC_MASK_PREFER_IRAM);
   while(1)
   {
     //Dynamic memory functions in IRAM
      uint32_t malloc_mask = set_malloc_regions(MALLOC_MASK_DRAM);     
      //Dynamic memory functions in DRAM
      set_malloc_regions(malloc_mask);
   }
}

set_malloc_regions(malloc_mask); will restore the old setting with IRAM ?
I suggest to add an exemple with different RAM scenario for final user.

@ourairquality
Copy link
Contributor Author

@Zaltora Yes, that code looks good. There are some examples in the lwip code and FreeRTOS code. I have not spotted any problems with the task priorities.

@ourairquality ourairquality force-pushed the malloc-regions branch 2 times, most recently from 3ac5c87 to 44738f4 Compare March 17, 2018 23:16
@ourairquality
Copy link
Contributor Author

A bug in the changes has been found and fixed. There was an error in alignment adjustments, the result could be that malloc used an extra word outside the region supplied, but it depended on the alignment of the end of the text segment which depended on the build. It only affected the object at the very end of the region, which might have lost one word. So it was not an easy one to narrow down. If people had problems with this change then it might be worth giving it another go.

@Zaltora
Copy link
Contributor

Zaltora commented Mar 27, 2018

I got no problem since i used this PR in my app. Ready to merge ?

@Zaltora
Copy link
Contributor

Zaltora commented May 20, 2018

Some news about this PR ?

@UncleRus
Copy link
Member

Going to merge it if there are no objections

@ourairquality
Copy link
Contributor Author

The SDK appears to have added re-entry support to the load-store exception handler, and perhaps that needs to be considered. I have assumed that this exception handler could not be interrupted by the NMI but perhaps it can and that would be a problem. Also perhaps let me rebase it and check it is all updated.

@ourairquality
Copy link
Contributor Author

A fix for the load-store exception handler being re-entered via a NMI has been added to this PR, and it has been rebased and lwip also updated. Would be curious to know if this fix addresses any issues people were having with this, such as the beacon timeouts?

@Zaltora
Copy link
Contributor

Zaltora commented Jun 20, 2018

Hi, i currently test your PR. I use my product as "wifi access point" and i connect to it with putty. 2 port ( cmd/dbg), i send and receive data without problem.
I will continue testing few days.

@Zaltora
Copy link
Contributor

Zaltora commented Jun 24, 2018

I got a very strange bug with this PR.
When i program, after esptool HW reset, everything work fine.
After the first reboot, my program boot (screen initialized and logo appear) but system seem frozen after that ( wifi not work / uart either / button interaction either ).
Need do more investigations. I got a lot of thing in my programm ( spiffs, pwm, wifi, i2c, spi, ... ).
I use a lot of malloc/free too. Maybe a thing don't like to be allocated in IRAM.

@ourairquality
Copy link
Contributor Author

The recent change was to the load/store exception handler, and if it booted up, got past user_start then the stack for that would have been initialized. Could you try it with the last patch reverted, so revert 'Load/store exception handler: handle re-entry via a NMI'?

@Zaltora
Copy link
Contributor

Zaltora commented Jun 27, 2018

After some additional tests: the problem feel more random.
my system can freeze after i push the first button. One time, the second system boot was good but not the third.
Attention: when i said freeze at boot, it is for my app. Because esp8266 do a lot of thing from my app before block. (My SSD1306 screen is always initialized for example)

How i can revert the last patch ? When i log the branch, i didn't see the corresponding commit.

The NMI can asynchronously interrupt the load/store exception handler. This
does occur frequently as the NMI handler code does invoke load/store
exceptions, and the load/store exception handler is heavly used. This was
corrupting the load/store exception handler saved state and thus randomly
corrupting registers a0 to a6 of the interruptee.
@ourairquality
Copy link
Contributor Author

Rebased to make the recent change the last commit, so just 'git checkout cc4bd3c' to remove that last change.

@Zaltora
Copy link
Contributor

Zaltora commented Jun 27, 2018

No changes, problems still here with this commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants