-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Malloc regions #584
base: master
Are you sure you want to change the base?
Malloc regions #584
Conversation
Unfortunately this is still not quite right. Reports of wifi beacon timeouts when using the lower icache for the malloc heap. It appears to actually be the hit losing the extra icache, as the problem is reproducible even when the lower icache has no data in it. Still there is a small gain using the unused iram. |
b1e7c41
to
c678ba9
Compare
I am in for test ! Like you know i am desperate for more RAM. My app is hungry. My recent work are specific to GUI system called lvgl, a project open source to design advanced GUI for many different screens. I want add it as component to esp-open-rtos. |
why beacon is impacted by this change ? |
@Zaltora Are you seeing the beacon timeout too? Perhaps moving some code to iram would solve this, but I have not been able to nail it down. I suspect it has something to do with the sdk libpp, and there is a NMI Wifi interrupt handler there that might be timing critical. I don't see the beacon timeout when just using the iram and not the icache. It would be great to unlock that extra 16k, and it seem so close. |
I didn't remember to get problem when i manually disable ICACHE bank 0 with #510. I will do a text to confirm. First remark, i got more HEAP than with official RTOS (global stream update ?) I got ~2k more. That is good. |
It needs to be given 0 to disable the low icache, and that should report an extra 16k. We can rework the options and interface based on feedback, if it proves useful.
|
Ok that my output with icache:
without icache:
I didn't see visible beacon problem for now. I will try with my app. |
Ok tested my final app. no problem on serial output. Trying to connect with Wifi, no problem detected What problem you got with beacon? you got a message on serial output ? |
i didn't use icache right now. Just enable the option in makefile and it is work. I guess by default, DRAM is used first. |
Yes, the default is to use the DRAM first, but that can be changed. I've been able to reproduce wifi problems. Still not sure what the cause is. It might be that the app is overloading the icache and performance is falling off a cliff, or there might be bugs in there somewhere. Shall keep exploring. If you could give it a try and report any problems that might give more clues. |
how work the system with: If i do this, only this task will use IRAM ?
|
Yes that usage of |
Wooo, when i define a malloc regions in a task, it like define a new propriety to the task. Nice system. I am really curious how it is work when coding. i put |
The setting is stored in the newlib reent structure which is swapped when freertos changes threads. There might have been other ways, and the newlib code is not pretty, but that seems a detail that can be cleaned up later if this works. The http_get_mbedtls example gives a beacon timeout, but it might have a large code footprint and it works the cpu hard. I tried an example loop just accessing the dram hard and see problems too, so there might be deeper problems when using only 16k of icache. Might also be seeing problems with the spi access code, when an nmi wifi interrupt occurs. Let me keep working though things to try to narrow them down. I have not spotted any data corruption issues though. It might not be a show stopper for some apps if the wifi loses some packets when the cpu is working hard, but that needs to be understand, need to be sure there are not other bugs. |
Testing of the flash code with buffers in iram or icache has shown no problems. Tested a range of small buffers sizes, buffer offsets, flash offsets, all worked ok. Looking at the source code also suggests it should not make a difference if the source and target buffers are in dram or iram, except for performance, because the code uses regular load and store instructions to copy between the flash i/o buffers. I have no more clues about wifi stability, not even sure if there is an issue. I made some of my code more robust to unreliable wifi and it seems ok. Was able to get the mbedt_get_http example running using larger buffers and the extra icache memory, and also the aws example using larger buffers. |
I was thinking about the wifi problem. If the example you tested make cpu work hard like "http_get_mbedtls". Maybe the problem is related to this because use IRAM intensifies the use of cpu. What effect got to overclock the cpu to 160MHz if it is not the case ? I got some library i want they use other RAM than the current task use (e.g: WIFI task use DRAM to avoid problem related to use IRAM like tx buffer. The following code is good ? : Task1()
{
set_malloc_regions(MALLOC_MASK_PREFER_IRAM);
while(1)
{
//Dynamic memory functions in IRAM
uint32_t malloc_mask = set_malloc_regions(MALLOC_MASK_DRAM);
//Dynamic memory functions in DRAM
set_malloc_regions(malloc_mask);
}
}
|
@Zaltora Yes, that code looks good. There are some examples in the lwip code and FreeRTOS code. I have not spotted any problems with the task priorities. |
3ac5c87
to
44738f4
Compare
A bug in the changes has been found and fixed. There was an error in alignment adjustments, the result could be that malloc used an extra word outside the region supplied, but it depended on the alignment of the end of the text segment which depended on the build. It only affected the object at the very end of the region, which might have lost one word. So it was not an easy one to narrow down. If people had problems with this change then it might be worth giving it another go. |
I got no problem since i used this PR in my app. Ready to merge ? |
44738f4
to
1e6f584
Compare
Some news about this PR ? |
Going to merge it if there are no objections |
The SDK appears to have added re-entry support to the load-store exception handler, and perhaps that needs to be considered. I have assumed that this exception handler could not be interrupted by the NMI but perhaps it can and that would be a problem. Also perhaps let me rebase it and check it is all updated. |
1e6f584
to
b4cde9f
Compare
A fix for the load-store exception handler being re-entered via a NMI has been added to this PR, and it has been rebased and lwip also updated. Would be curious to know if this fix addresses any issues people were having with this, such as the beacon timeouts? |
Hi, i currently test your PR. I use my product as "wifi access point" and i connect to it with putty. 2 port ( cmd/dbg), i send and receive data without problem. |
I got a very strange bug with this PR. |
The recent change was to the load/store exception handler, and if it booted up, got past user_start then the stack for that would have been initialized. Could you try it with the last patch reverted, so revert 'Load/store exception handler: handle re-entry via a NMI'? |
After some additional tests: the problem feel more random. How i can revert the last patch ? When i log the branch, i didn't see the corresponding commit. |
The NMI can asynchronously interrupt the load/store exception handler. This does occur frequently as the NMI handler code does invoke load/store exceptions, and the load/store exception handler is heavly used. This was corrupting the load/store exception handler saved state and thus randomly corrupting registers a0 to a6 of the interruptee.
b4cde9f
to
acaf6f0
Compare
Rebased to make the recent change the last commit, so just 'git checkout cc4bd3c' to remove that last change. |
No changes, problems still here with this commit. |
Add support for using the spare iram and optionally part of the icache as part of the malloc heap. This can add roughly 20k to the malloc heap, but with conditions.
To use the icache for the malloc heap, compile with
-DESP8266_ENABLE_LOW_ICACHE
There is obviously a trade off to be made, but if desperate for memory then this might be an option.Newlib has been extended to be able to allocate from two separate regions. This is controlled per-thread by calling
set_malloc_regions()
. There are four options:MALLOC_MASK_DRAM
andMALLOC_MASK_IRAM
to allocate from only the dram or iram respectively, andMALLOC_MASK_PREFER_DRAM
andMALLOC_MASK_PREFER_IRAM
to prefer dram or iram respectively but to use the other region if necessary. The functionset_malloc_regions()
returns the prior option, so this can be used to make a change for some dynamic code block, for example if calls to malloc in a dynamic context need to be in a particular region. This is used in lwip and FreeRTOS where necessary to ensure buffers are in dram, so those are some example. The wificfg example has been updated to report both the dram and iram usage. The amount of free memory reported bymallinfo()
also depends on the option set byset_malloc_regions()
so if allocating to only dram or iram then it returns only the free memory in that region, and this is also thread local, and there is an example in lwip where it checks the available dram.There are some hacks involved, particularly on the newlib side, it is not pretty with hard coded heuristics and addresses specific to esp-open-rtos, but it might be some time before that is cleaned up and it should not be broken, just not pretty. For example there is a heuristic to preference iram if the dram is getting low to try to ensure that there is room to allocate the lwip pbufs as the TX buffers appear to need to be in dram.
The defaults should be a good start, preferring dram but using iram when necessary.
Been testing it for a little while now, and it seems to be holding up. Testing and feedback welcomed.