Skip to content

Conversation

IgorYbema
Copy link
Owner

@IgorYbema IgorYbema commented Sep 1, 2024

release v3.7 adds to v3.62 the following:

  • changed heatpump data in rules is now updated directly (bug)
  • NTP resync works also if during reboot there was no internet/wifi
  • website table refreshing is now using the JSON output and not a hard coded html output anymore
  • small urldecode issue causing a crash (bug)
  • JSON output is chopped into more chunks each webserver loop making it less memory hungry
  • optional PCB emulation topics are now (finally) visible on the webgui

@IgorYbema
Copy link
Owner Author

I am still seeing a crash reboot on the esp8266 when I open the webconsole after some hours/days of running. It immediatly crashreboots just after loading the first screen. Probably due to memory issue but hard to reproduce.

@stumbaumr
Copy link

stumbaumr commented Sep 2, 2024

Just installed this build here: https://github.com/IgorYbema/HeishaMon/actions/runs/10656111818

When do you plan to merge? Forget about it, this build is really unstable on the ESP8266...

@stumbaumr
Copy link

@IgorYbema
Copy link
Owner Author

@stumbaumr what are you expierencing? I am running this build fine on my ESP8266

@stumbaumr
Copy link

I was experiencing reboot after reboot after reboot...

Do you have rules active in your setup? I have 9 running...

@IgorYbema
Copy link
Owner Author

No i don't have rules. Do you want to share your rules? I hope I can replicate your issue so I can see what is going wrong

@stumbaumr
Copy link

Stability wise I am happy now...
grafik

@Binifada1956
Copy link

Binifada1956 commented Sep 4, 2024

Heishamon V5 - Build binary Egyras#611

No Variables in Console!

Before used Build Egyras#580, here was ok.

Sorry, have seen is a newer Build Egyras#613. Console and Variables are ok ;-)

@IgorYbema
Copy link
Owner Author

Console output shows the variables again... but the rules output does not show up in the MQTT log output...

I believe this was never available on the MQTT log output. The code used for the rules logging is seperate and is only printed to webconsole and serial port. But I'll modify that in a later version

@IgorYbema
Copy link
Owner Author

@McMagellan can you try https://github.com/IgorYbema/HeishaMon/actions/runs/10706629932 with your large ruleset and a reboot?
I believe it is the watchdog resetting due to the ruleset taking to long to load during boot. With my test setup it works fine but it probably depends on something more causing it to crash with your setup (maybe the wifi or something) because of just too much time it takes. The reset reason is '3' (meaning watchdog reset).

I added some dog feeding rules during rules loading.

@McMagellan
Copy link

Feedback Firmware Egyras#614 Signature Alpha e37c69b

You wrote that it might be because of my surroundings.
So I did the following test with the "old" version Egyras#613:

1 ruleset deleted.
2 Factory reset carried out to have a normalized environmental state.
3 PC Connected to the Heishamon AP.
4 Rules set loaded via rules window with the Save button -> OK
5 Reboot and Power Off test -> OK

Then I activated the 1Wire IF in the settings to which 7 Dallas sensors are connected.
6 Reboot -> Heishamon Crashes during reload the existing ruleset (rule Nr10)
This combination is obviously responsible for the crash in my case.

On this occasion I noticed something else:
If the 1Wire port has been activated in the settings, the port will not work for the time being. The function only works after a reboot.

Then I repeated the test with the new version Egyras#614.
The rules loading error is still there just like in the old version.

@IgorYbema
Copy link
Owner Author

Ok thanks. And did you test the new build also?
Activating 1wire indeed only works after reboot. And during reboot 1wire also takes some time. So including the rules loading it might cause the watchdog reset. I'll try that also.

@McMagellan
Copy link

@IgorYbema
Then I repeated the test with the new version Egyras#614.
The rules loading error is still there just like in the old version.

@IgorYbema
Copy link
Owner Author

Ok thanks. I'll try to reproduce with 1wire enabled. Will be later this week

@IgorYbema
Copy link
Owner Author

Good news, I managed to reproduce this. With 5 temp sensor ok, with 6 sensor it fails to boot with the rules. Now time for debugging :)

@McMagellan
Copy link

Quick question in between:
Is this Rules code working or not.

on System#Boot then
#Monitor = "Check OK: ";
if #Monitor == "Check OK: " then
#Heizbetrieb = 1;
end
end

@IgorYbema
Copy link
Owner Author

Quick question in between: Is this Rules code working or not.

on System#Boot then
#Monitor = "Check OK: ";
if #Monitor == "Check OK: " then
#Heizbetrieb = 1;
end
end

No I get: 423703: ERROR: Expected a parenthesis block, function, number or variable

And I fixed the issue. Was indeed watchdog reset. You can test with the new build https://github.com/IgorYbema/HeishaMon/actions/runs/10724868618

@IgorYbema
Copy link
Owner Author

I believe there is more wrong with the usage of the double-quotes. Following example will crash the system... so this is something for @CurlyMoo to look at

on System#Boot then
#Monitor = "1";
if #Monitor == "1" then
#Heizbetrieb = 1;
end
end

@McMagellan
Copy link

Feedback FW Igor Egyras#617 Signature Alpha ffa04b6 (Test: Big Rule crashes after reboot)

The test was successful and the ruleset was started again even after a reboot.
Good job, one less mysterious error, thanks.

I also wanted to say that this ruleset adjusted the heating curve parameters with SetCurves. So check if used.

I'm currently expanding the ruleset and using the "print" command several times. Is there already a guideline for how large the ruleset can be? So far I have never used more than 15 variables. How many can I use now without conflicts in level 3.7?

@IgorYbema
Copy link
Owner Author

I'm currently expanding the ruleset and using the "print" command several times. Is there already a guideline for how large the ruleset can be? So far I have never used more than 15 variables. How many can I use now without conflicts in level 3.7?

I can't say for sure. The memory is the final limit but it is hard to tell what the max is. With the new larger board with ESP32 you can go much further if you reach the max on the older model as it has more memory.
But yes, you just need to go and try :)

image

@IgorYbema
Copy link
Owner Author

I'll declare this v3.7 as final now and push it to egyras repo

@IgorYbema IgorYbema merged commit 4e8cfd9 into main Sep 5, 2024
2 checks passed
@CurlyMoo CurlyMoo mentioned this pull request Sep 5, 2024
@McMagellan
Copy link

Feedback V3.8 Spontaneous crash without reloading the rules.

Let me say that version 3.8 runs very well and is largely stable.
However, sometimes there are problems. This is recorded in the debug file.
Crash1.txt

The recording begins one hour before the crash.
I maintain a permanent connection to Heishamon Colsole.
There were 112 new web server calls in this hour.
The crash (Line 10699) is probably caused by the Heishamon Watchdog for no apparent reason.
The rules are then not reloaded (Line 10765).
The recording ends 5 minutes after the crash.
The web server calls continued automatically.

I programmed a watchdog in iobroker Blockly that carries out a Heishamon reboot.
The rules will then be started again. (Line 11159)

@IgorYbema
Copy link
Owner Author

Funny thing. At boot it says rst reason: 4 (which is a software reset) while the rules check says reason: 2 (which is a exception failure). If it was 4 it would not have disable the rules so I believe it wasn't a 4 or a watchdog reset but a exception.

What is the reason you keep the webclient open? Can you check if it keeps running without webclients?

@McMagellan
Copy link

The reason why I currently have the web server open with the console for hours is that I continue to work on my CoPilot ruleset.
It is now 38 rules long and works well except for minor errors. To detect errors, I monitor the running variables.
I could also use the debug file, but not in real time and the laptop that records is in the basement.

So is there a limit on the web server or is it a browser problem on the PC that logs in so often?

Here is another recording of a reset from Heishamon with firmware 3.8 that is triggered by the message:
Line 9145: "Out of memory webserver_send_content:#1681"

2009cputty.log

The log file also includes the hour before when I first connected to the console and the 15 minutes after the restart.
Here the ruleset was then reloaded and only the values ​​of the variables were lost.

Is there a better place for feedback on 3.8 than here? Please let me know.

@IgorYbema
Copy link
Owner Author

You could use the 'log to mqtt' and monitor the console log through there. We know there is still a memory issue with the webserver so eventually we can not distuingish between maybe a rules error or a webserver error if you keep monitor it using the web console.
I'am ok that you keep reporting here for now

@McMagellan
Copy link

@IgorYbema: log to mqtt sounds good but doesnt work with any Rulesset values. See the same timerange as following.

Part1: mqtt- log in iobroker

Mon Sep 23 15:34:39 2024 (18614459): received TOP117 Economizer_Outlet_Temp: -128
Mon Sep 23 15:34:39 2024 (18614467): received TOP118 Second_Room_Thermostat_Temp: -128
Mon Sep 23 15:34:48 2024 (18622717): Heishamon stats: Uptime: 0 days 5 hours 10 minutes 22 seconds ## Free memory: 59% ## Heap fragmentation: 16% ## Max free block: 19208 bytes ## Free heap: 22944 bytes ## Wifi: 100% (RSSI: -48) ## Mqtt reconnects: 1 ## Correct data: 100.00% Rules active: 38
Mon Sep 23 15:34:48 2024 (18622736): Requesting new panasonic data
Mon Sep 23 15:34:48 2024 (18622738): sent bytes: 111 including checksum value: 18 
Mon Sep 23 15:34:48 2024 (18623186): Received 203 bytes data
Mon Sep 23 15:34:48 2024 (18623188): Checksum and header received ok!
Mon Sep 23 15:34:48 2024 (18623194): received TOP29 Z1_Heat_Curve_Target_High_Temp: 38
Mon Sep 23 15:34:48 2024 (18623202): received TOP30 Z1_Heat_Curve_Target_Low_Temp: 28
Mon Sep 23 15:34:48 2024 (18623210): received TOP42 Z1_Water_Target_Temp: 23
Mon Sep 23 15:34:50 2024 (18625345): Requesting new 1wire temperatures
Mon Sep 23 15:34:51 2024 (18626003): Received 1wire sensor temperature (28fce416000000e5): 21.62
Mon Sep 23 15:34:51 2024 (18626021): Received 1wire sensor temperature (2855c11800000010): 22.88
Mon Sep 23 15:34:51 2024 (18626037): Received 1wire sensor temperature (28c3ae1600000078): 24.56
Mon Sep 23 15:34:51 2024 (18626055): Received 1wire sensor temperature (2867e51600000061): 20.81
Mon Sep 23 15:34:51 2024 (18626075): Received 1wire sensor temperature (28ff6402e93da462): 24.38
Mon Sep 23 15:34:51 2024 (18626094): Received 1wire sensor temperature (28ff6402e92394ec): 24.06
Mon Sep 23 15:34:51 2024 (18626112): Received 1wire sensor temperature (28ff6402e95796dd): 47.75
Mon Sep 23 15:34:58 2024 (18632718): Heishamon stats: Uptime: 0 days 5 hours 10 minutes 32 seconds ## Free memory: 60% ## Heap fragmentation: 17% ## Max free block: 19208 bytes ## Free heap: 23024 bytes ## Wifi: 100% (RSSI: -48) ## Mqtt reconnects: 1 ## Correct data: 100.00% Rules active: 38
Mon Sep 23 15:34:58 2024 (18632737): Requesting new panasonic data
Mon Sep 23 15:34:58 2024 (18632740): sent bytes: 111 including checksum value: 18 
Mon Sep 23 15:34:58 2024 (18633186): Received 203 bytes data
Mon Sep 23 15:34:58 2024 (18633188): Checksum and header received ok!
Mon Sep 23 15:35:08 2024 (18642719): Heishamon stats: Uptime: 0 days 5 hours 10 minutes 42 seconds ## Free memory: 61% ## Heap fragmentation: 18% ## Max free block: 19208 bytes ## Free heap: 23432 bytes ## Wifi: 100% (RSSI: -48) ## Mqtt reconnects: 1 ## Correct data: 100.00% Rules active: 38
Mon Sep 23 15:35:08 2024 (18642741): sent bytes: 111 including checksum value: 18 
Mon Sep 23 15:35:08 2024 (18643186): Received 203 bytes data
Mon Sep 23 15:35:08 2024 (18643188): Checksum and header received ok!
Mon Sep 23 15:35:10 2024 (18645349): Requesting new 1wire temperatures
Mon Sep 23 15:35:11 2024 (18646068): Received 1wire sensor temperature (28ff6402e92394ec): 24.00
Mon Sep 23 15:35:18 2024 (18652720): Heishamon stats: Uptime: 0 days 5 hours 10 minutes 52 seconds ## Free memory: 61% ## Heap fragmentation: 18% ## Max free block: 19208 bytes ## Free heap: 23456 bytes ## Wifi: 100% (RSSI: -48) ## Mqtt reconnects: 1 ## Correct data: 100.00% Rules active: 38
Mon Sep 23 15:35:18 2024 (18652742): sent bytes: 111 including checksum value: 18 
Mon Sep 23 15:35:18 2024 (18653186): Received 203 bytes data
Mon Sep 23 15:35:18 2024 (18653188): Checksum and header received ok!
Mon Sep 23 15:35:18 2024 (18653191): received TOP51 Inside_Pipe_Temp: 19

Part2: Log from the console

Mon Sep 23 15:34:39 2024 (18614459): received TOP117 Economizer_Outlet_Temp: -128
Mon Sep 23 15:34:39 2024 (18614467): received TOP118 Second_Room_Thermostat_Temp: -128
Mon Sep 23 15:34:48 2024 (18622717): Heishamon stats: Uptime: 0 days 5 hours 10 minutes 22 seconds 
## Free memory: 59% ## Heap fragmentation: 16% ## Max free block: 19208 bytes ## Free heap: 22944 bytes 
## Wifi: 100% (RSSI: -48) ## Mqtt reconnects: 1 ## Correct data: 100.00% Rules active: 38
Mon Sep 23 15:34:48 2024 (18622736): Requesting new panasonic data
Mon Sep 23 15:34:48 2024 (18622738): sent bytes: 111 including checksum value: 18 
Mon Sep 23 15:34:48 2024 (18623186): Received 203 bytes data
Mon Sep 23 15:34:48 2024 (18623188): Checksum and header received ok!
Mon Sep 23 15:34:48 2024 (18623190): received TOP7 Main_Target_Temp: 23
Mon Sep 23 15:34:48 2024 (18623194): received TOP29 Z1_Heat_Curve_Target_High_Temp: 38
Mon Sep 23 15:34:48 2024 (18623202): received TOP30 Z1_Heat_Curve_Target_Low_Temp: 28
Mon Sep 23 15:34:48 2024 (18623210): received TOP42 Z1_Water_Target_Temp: 23
==== Z1_Water_Target_Temp ====
rule #10 was executed in 12086 microseconds

>>> local variables

 0 $VLSollTemperatur = 23
 1 $VListTemperatur = 25.25
 2 $RListTemperatur = 24.25

>>> global variables

 0 #Delta = 2.25
 1 #Bonus = 0
 2 #Quicky = 0
 3 #BoniPool = 3
 4 #TestBonusLangsam = 0
 5 #TestBonusQuick = 0
 6 #TestBonusAbbau = 0
 7 #Clock = -36
 8 #Periode = 160
 9 #TaktMinuten = 18
10 #BoniGradMin = 26
11 #MinimModMin = 15
12 #HeatpumpOn = 0
13 #TOP18Backup = 0
14 #TOP27Backup = 0
15 #Watchdog = 21
16 #CoPilotStatus = 3
Mon Sep 23 15:34:50 2024 (18625345): Requesting new 1wire temperatures
Mon Sep 23 15:34:51 2024 (18626003): Received 1wire sensor temperature (28fce416000000e5): 21.62
Mon Sep 23 15:34:51 2024 (18626021): Received 1wire sensor temperature (2855c11800000010): 22.88
Mon Sep 23 15:34:51 2024 (18626037): Received 1wire sensor temperature (28c3ae1600000078): 24.56
Mon Sep 23 15:34:51 2024 (18626055): Received 1wire sensor temperature (2867e51600000061): 20.81
Mon Sep 23 15:34:51 2024 (18626075): Received 1wire sensor temperature (28ff6402e93da462): 24.38
Mon Sep 23 15:34:51 2024 (18626094): Received 1wire sensor temperature (28ff6402e92394ec): 24.06
Mon Sep 23 15:34:51 2024 (18626112): Received 1wire sensor temperature (28ff6402e95796dd): 47.75
==== timer=40 ====
rule #27 was executed in 7798 microseconds

>>> local variables


>>> global variables

 0 #Delta = 2.25
 1 #Bonus = 0
 2 #Quicky = 0
 3 #BoniPool = 3
 4 #TestBonusLangsam = 0
 5 #TestBonusQuick = 0
 6 #TestBonusAbbau = 0
 7 #Clock = -36
 8 #Periode = 160
 9 #TaktMinuten = 18
10 #BoniGradMin = 26
11 #MinimModMin = 15
12 #HeatpumpOn = 0
13 #TOP18Backup = 0
14 #TOP27Backup = 0
15 #Watchdog = 21
16 #CoPilotStatus = 3
Mon Sep 23 15:34:58 2024 (18632718): Heishamon stats: Uptime: 0 days 5 hours 10 minutes 32 seconds 
## Free memory: 60% ## Heap fragmentation: 17% ## Max free block: 19208 bytes ## Free heap: 23024 bytes 
## Wifi: 100% (RSSI: -48) ## Mqtt reconnects: 1 ## Correct data: 100.00% Rules active: 38
Mon Sep 23 15:34:58 2024 (18632737): Requesting new panasonic data
Mon Sep 23 15:34:58 2024 (18632740): sent bytes: 111 including checksum value: 18 
Mon Sep 23 15:34:58 2024 (18633186): Received 203 bytes data
Mon Sep 23 15:34:58 2024 (18633188): Checksum and header received ok!
Mon Sep 23 15:35:08 2024 (18642719): Heishamon stats: Uptime: 0 days 5 hours 10 minutes 42 seconds 
## Free memory: 61% ## Heap fragmentation: 18% ## Max free block: 19208 bytes ## Free heap: 23432 bytes 
## Wifi: 100% (RSSI: -48) ## Mqtt reconnects: 1 ## Correct data: 100.00% Rules active: 38
Mon Sep 23 15:35:08 2024 (18642738): Requesting new panasonic data
Mon Sep 23 15:35:08 2024 (18642741): sent bytes: 111 including checksum value: 18 
Mon Sep 23 15:35:08 2024 (18643186): Received 203 bytes data
Mon Sep 23 15:35:08 2024 (18643188): Checksum and header received ok!
Mon Sep 23 15:35:10 2024 (18645349): Requesting new 1wire temperatures
Mon Sep 23 15:35:11 2024 (18646068): Received 1wire sensor temperature (28ff6402e92394ec): 24.00
Mon Sep 23 15:35:18 2024 (18652720): Heishamon stats: Uptime: 0 days 5 hours 10 minutes 52 seconds 
## Free memory: 61% ## Heap fragmentation: 18% ## Max free block: 19208 bytes ## Free heap: 23456 bytes 
## Wifi: 100% (RSSI: -48) ## Mqtt reconnects: 1 ## Correct data: 100.00% Rules active: 38
Mon Sep 23 15:35:18 2024 (18652739): Requesting new panasonic data
Mon Sep 23 15:35:18 2024 (18652742): sent bytes: 111 including checksum value: 18 
Mon Sep 23 15:35:18 2024 (18653186): Received 203 bytes data
Mon Sep 23 15:35:18 2024 (18653188): Checksum and header received ok!
Mon Sep 23 15:35:18 2024 (18653191): received TOP51 Inside_Pipe_Temp: 19

@IgorYbema
Copy link
Owner Author

Alright yes I understand. This printing of the variables is some left over debug code. Shouldn't be there anyway :)

Maybe those variables should be visible as a new mqtt topic so you can monitor any change. I'll think about how to add that.

But for now, could you just stop monitoring and let it run for a few hours/days to check if it crashes then also or not?

@McMagellan
Copy link

OK, now I'm taking a 24 hour console break.

However, if you set up an extra MQTT channel for rulesset values, keep in mind that it is important to include the "set" commands and the "received TOP xxx" messages.

@McMagellan
Copy link

I have another suggestion in connection with an independent MQTT channel.

When uploading a rule, you can observe the progress on the debug port.
If an error occurs, you can see which rule was aborted and why.

With older firmware versions it was possible to establish two web server connections to Heishamon.
One to the Rules window and the other parallel to the Console.
This meant you could follow the progress from your PC and see where the problem with a rejected ruleset was.

In V3.8 this is only possible with small rulesets. If I use my larger ruleset in the manner described, there is always a crash with two running web server tasks and the loading of the rules fails.

I would therefore like to make a suggestion:
If you are thinking about a separate mqtt channel for the ongoing ruleset output, it might make sense to create another mqtt channel on which only the parsing process with possible errors if aborted is output. If you want, you can record this in iobroker/influxdb, for example, and then evaluate it at your leisure without putting additional strain on Heishamon.
It would be a significant improvement to be able to see what Rules fails.
Today I have to go to the basement after a loading error and get a copy of the debug log file to see.
However, very few people have the ability to record the debug port.

@McMagellan
Copy link

Feedback test without Heishamon Console connection.

Here are the key data from the debug log file.
I didn't establish a web server connection for 2 days, 1 hour and 33 minutes, except for an accidental connection that was terminated in the same minute (8:40 at uptime 1:22:15).
Everything was OK during this time.
When I then established a permanent console connection, Heishamon triggered a reset after 26 minutes, as a result of which the rules were no longer loaded.
I can provide the debug log file if desired.

9/23/24 16:11: Start test at uptime 5h 47min Rules always running.
9/24/24 3:47: Refresh DSL connection. Web server contact to router.
9/25/24 8:40: Accidental call. Was finished within 1 minute.
9/25/24 17:29: Connection with IP unset?

New web server client: (IP unset):0
Closing web server client: (IP unset):0

9/25/24 17:44: Start working with Heishamon, permanent console connection.
9/25/24 18:10: Heishamon starts a reset without loading the rules for no apparent reason


Wed Sep 25 18:10:08 2024 (200742275): received TOP76 Heating_Mode: 0
Wed Sep 25 18:10:08 2024 (200742280): received TOP77 Heating_Off_Outdoor_Temp: 19
Wed Sep 25 18:10:08 2024 (200742288): received TOP78 Heater_On_Outdoor_Temp: 0
Wed Sep 25 18:10:08 2024 (200742297): received TOP79 Heat_To_Cool_Temp: 15
Wed Sep 25 18:10:08 2024 (200742301): received TOP80 Cool_To_Heat_Temp: 10
Wed Sep 25 18:10:08 2024 (200742309): received TOP81 Cooling_Mode: 0
Wed Sep 25 18:10:08 2024 (200742314): received TOP82 Z2_Heat_Curve_Target_High_Temp: 37
Wed Sep 25 18:10:08 2024 (200742322): received TOP83 Z2_Heat_Curve_Target_Low_Temp: 27
Starting debugging, version: 3.8

--- HEISHAMON ---
starting...
Checking littlefs for first boot...
Heishamon boot file exists, normal boot...
Send current wifi info to serial...
Mode: NULL
PHY mode: N
Channel: 1
AP id: 0
Status: 255
Auto connect: 1
SSID (0): 
Passphrase (0): 
BSSID set: 0
Loading config from flash...

@McMagellan
Copy link

Feedback Test 3.8, Memoryfight between rules and webserver.

After I found a small error in the ruleset, I reestablished a web server connection to load the new ruleset.
After a short time there was a crash and then a second crash after 10 seconds uptime. See debug log file.
2809bputty.log

4018: (1st time) New webserver client: 192.168.178.86:57095 (Uptime: 3 days 2 hours 39 minutes 46 seconds)
4148: Deinitialize rules engine...
4305: reading rules
4581: ==== System#Boot ====
5481: Starting debugging, version: 3.8 (Last Uptime: 3 days 2 hours 45 minutes 26 seconds)
5536: Not loading rules due to crash reboot!
5763: Starting debugging, version: 3.8 (manuell reboot to start loadig Rulesset again: Uptime: 0 days 0 hours 1 minute 31 seconds)
5819: reading rules
6096: ==== System#Boot ====
6416: Out of memory webserver_send_content:#1681 (last Uptime: 0 days 0 hours 0 minutes 10 seconds)
6417: Starting debugging, version: 3.8
6473: reading rules
6749: ==== System#Boot ====
7612: (end webserver session) Closing webserver client: 192.168.178.86:57158
7613: Closing webserver client: 192.168.178.86:57152

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants