Skip to content

Conversation

CurlyMoo
Copy link

@CurlyMoo CurlyMoo commented May 18, 2024

When the year is 1970 resync every 5 minutes. If the date is properly set, resync every day.

@IgorYbema
Copy link
Owner

Seems ok

@CurlyMoo
Copy link
Author

Let's wait at least for the results of the tests of @McMagellan

@RichieB2B
Copy link

RichieB2B commented May 18, 2024

The rescheduling is done twice: both in timer_cb() every minute and in ntpReload() itself every 5 minutes. So you'll get 2 timerqueue_inserts(), then 4, then 8 etc. Doubling with every call to timer_cb() for the -5 case.

@CurlyMoo
Copy link
Author

@RichieB2B Two things. The difference in time was indeed wrong. However, the double call to timerqueue_insert wasn't necessarily because it would just overwrite the running timer. Despite that, i changed it according to your feedback.

@RichieB2B
Copy link

@CurlyMoo So an extra call to timerqueue_inserts(x, y, -5) would just reset the previous timer. That's much less of an issue than I thought but still good to avoid it.

@CurlyMoo
Copy link
Author

@RichieB2B Exactly.

@McMagellan
Copy link

Feedback: Testing time synchronization. Alpha-467a907, Egyras#527
The test failed.

Method:

  1. Disconnect Heishamon and Fritzbox from the power supply.
  2. Turn the power back on.
  3. After about 6 minutes, contact with the Heishamon console could be established again from the browser.
    --> Timestamp 1/1/1970, 1:06, Uptime 6 minutes.
  4. Today morning see Screenshot:
    -> Timestamp 1/1/1970, 9:37, Uptime 8:h 37 min.

Screenshot 2024-05-19 at 07-40-51 Heisha monitor

@geduxas
Copy link

geduxas commented May 19, 2024

@CurlyMoo why not use some basic logic, just 2 variables, one boolen for time sync'ed flag, and one last time synced (timestamp after timesync) and check by that.. if it's never synced (try every 5min) and if synced try after last sync_time timeout.

@CurlyMoo
Copy link
Author

CurlyMoo commented May 19, 2024

@CurlyMoo why not use some basic logic, just 2 variables, one boolen for time sync'ed flag, and one last time synced (timestamp after timesync) and check by that.. if it's never synced (try every 5min) and if synced try after last sync_time timeout.

That's what we're basicly doing here, but centralized through the timepool.

@geduxas
Copy link

geduxas commented May 19, 2024

Oh, sorry, i saw somewhere you mentioned exact dates.. so thought it tied to dates :) didn't check source.

@CurlyMoo
Copy link
Author

CurlyMoo commented May 19, 2024

Oh, sorry, i saw somewhere you mentioned exact dates.. so thought it tied to dates :) didn't check source.

It does, because by definition unix time starts on the 1st of januari 1970. So, checking for a year other than 1970 is similar to that sync flag but with more accuracy of success.

https://en.wikipedia.org/wiki/Unix_time

@CurlyMoo
Copy link
Author

@McMagellan Can you let it run again for a while and check for the Syncing with ntp servers loglines?

@McMagellan
Copy link

Feedback: Testing time synchronization. Alpha baddb94, Egyras#531
The test failed again.

How I do it:

  1. Close the console window in the PC browser.
  2. Disconnect the main Fritzbox1 from power
  3. (in the house basement) switch off the power to the Fritzbox2 repeater.
  4. Switch off Heishamon, wait 1 minute.
  5. Switch on Heishamon, switch on Fritzbox2, switch on Fritzbox1.

LED WLAN and DSL on Fritzbox1 flash for about 5 - 6 minutes until the PC browser is connected to the WLAN again.
During this time the Heishamon Setup AP is active and I connected to the console via it to test it.
Message every 15 seconds: Reconnecting to WIFI failed. Waiting a few seconds before trying again.

Screenshot 2024-05-19 at 11-24-56 Heisha monitor

For the next attempt, I activated "Debug log to MQTT topic from start" to have the messages recorded in the influxdb immediately after reconnection.
In the log file (see logfile) the first 4 lines are from the previous correct connection. The first data set is recorded after 3 minutes 11 seconds after power returns (start uptime new).
panasonic heat pump_log.csv

The timestamp at the beginning of the line comes from the Raspberry.
The Heishamon time is then at 1/1/1970 and 1:03:11 and runs synchronously with the uptime.
There was no further synchronization.

@McMagellan
Copy link

@CurlyMoo: How can i do this?
@McMagellan Can you let it run again for a while and check for the Syncing with ntp servers loglines?

@McMagellan
Copy link

I did another test on synchronizing the date and time.
After about 30 minutes, dated January 1st, 1970, I changed and saved the Heishamon settings in a second console window.
The time was then correctly synchronized and the uptime and rules continued.

This process was recorded in the first console window. You can see that the time has been updated.

Console log update Settings

Security notice:
While the settings is being updated, all parameters of the profile are displayed in the log file. This also includes the password for the currently connected WiFi network. Marked in blue here in the picture.
FYI.

@stumbaumr
Copy link

stumbaumr commented Jun 7, 2024

Somehow yesterday afternoon the HeishaMon rebooted, left rules active, but did not get the proper time. Only found out this morning when looking at the graphs and seeing a DHW production attempt at 4:55. Looks like I need to help to get this PR into main...
grafik

@CurlyMoo
Copy link
Author

CurlyMoo commented Jun 7, 2024

Looks like I need to help to get this PR into main...

Please do!

@stumbaumr
Copy link

stumbaumr commented Jun 7, 2024

Why do you need two timers? The code from timer -6 could look like this:

      case -6: {
          time_t now = time(NULL);
          struct tm *tm_struct = localtime(&now);
          if(tm_struct->tm_year == 1970) {
            logprintln_P(F("I am still in the year 1970, syncing with ntp servers, check again in one minute"));
            timerqueue_insert(60, 0, -6);
          } else {
            logprintln_P(F("Time probably correct, syncing with ntp servers, check again in a day"));
            timerqueue_insert(86400, 0, -6);
          }
          ntpReload(&heishamonSettings);
        } break;

and then in systemboot just call it after a minute.

      logprintln_P(F("Syncing with ntp servers, check again in one minute"));
      ntpReload(&heishamonSettings);
      timerqueue_insert(60, 0, -6);

And then delete timer -5...
...and rename -6 to -5...

@CurlyMoo
Copy link
Author

CurlyMoo commented Jun 7, 2024

Why do you need two timers?

Because in your case you're always doing a ntpReload even if it's not necessary.

@stumbaumr
Copy link

Why do you need two timers?

Because in your case you're always doing a ntpReload even if it's not necessary.

Only once, right after the systemboot.
But you could also remove that, doing things like this

      case -6: {
          ntpReload(&heishamonSettings);
          time_t now = time(NULL);
          struct tm *tm_struct = localtime(&now);
          if(tm_struct->tm_year == 1970) {
            logprintln_P(F("I am still in the year 1970, check again in one minute"));
            timerqueue_insert(60, 0, -6);
          } else {
            logprintln_P(F("Time probably correct, check again in a day"));
            timerqueue_insert(86400, 0, -6);
          }
        } break;

and

      logprintln_P(F("Syncing with ntp servers after boot"));
      timerqueue_insert(1, 0, -6);

@CurlyMoo
Copy link
Author

CurlyMoo commented Jun 7, 2024

Same thing. You are already doing a ntpReload while you haven't checked if the previous ntpReload succeeded.

@CurlyMoo
Copy link
Author

CurlyMoo commented Jun 7, 2024

And i don't see why using two timers is an issue? They're not active at the same time so aren't a performance issue at all.

@stumbaumr
Copy link

Same thing. You are already doing a ntpReload while you haven't checked if the previous ntpReload succeeded.

It is just doing it the other way around:

  • Do the ntpReload, if it failed, schedule another one and try then again.
  • If successful, schedule it for the next day anyway.

And i don't see why using two timers is an issue? They're not active at the same time so aren't a performance issue at all.

It is less code and easier to read. Less code is easier to maintain.

@CurlyMoo
Copy link
Author

CurlyMoo commented Jun 7, 2024 via email

@stumbaumr
Copy link

A reboot while Fritzbox being up just works...

image

/*
* Wait 300 sec less than a full day
*/
logprintln_P(F("Syncing with ntp servers, check again in a day"));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log should change to:
Succesfully synced with ntp servers, checking again in a day

@CurlyMoo
Copy link
Author

In the logs by @McMagellan it works fine everytime. @IgorYbema The only thing that confuses me is that in the 26dputty.log it started its first sync after 17 minutes. However, @McMagellan could connect to the webserver in the meanwhile. So @IgorYbema either i put the first sync in the wrong spot or in the check_wifi function it doesn't enter this line right:

if (firstConnectSinceBoot) { // this should start only when softap is down or else it will not work properly so run after the routine to disable softap

@McMagellan
Copy link

I repeated today the ntp test and this time it didn't work.

29bputty.log

Initial situation:

  1. Rules set deleted in memory to ensure a safe start.

  2. Firmware Add WH-MXC12J9E8 type info Egyras/HeishaMon#583 Signature Alpha-f2a171b loaded.

  3. Heishamon Reboot via Browser.

  4. Fritzbox 1 & 2 power off.

  5. Heishamon Power off.

Then Starts the Debug Logfile 29bputty.log.

  1. Heishamon power on.
  2. Fritzbox2 power on.
  3. Fritzbox1 Power on.

I then waited 40 minutes and the running log file showed that no synchronization had taken place.
The AP was still active. Therefore, no connection could be established via WiFi.

  1. Restart by power off / on (See Line 3360)

Normal start of Heishamon because the WLAN environment was ntp-sync again at this time.

What I also saw are several web server connections like this:
Thu Jan 1 01:04:01 1970 (241229): received TOP6 Main_Outlet_Temp: 25.25
Timeout web server client: 192.168.4.100:59215
Closing web server client: 192.168.4.100:59215
Thu Jan 1 01:04:07 1970 (247085): Pulses seen on S0 port 1: Good: 0 Bad: 0 Average good pulse width: 0

The IP address is that of your own access point??

@McMagellan
Copy link

What I also saw is a hole in the MQTT bulk transmission shortly after the correct time synchronization.

Thu Aug 29 13:15:59 2024 (11160): received TOP104 Solar_Frost_Protection: 5
Thu Aug 29 13:15:59 2024 (11165): received TOP105 Solar_High_Limit: 80
Thu Aug 29 13:15:59 2024 (11173): received TOP106 Pump_Flowrate_Mode: 1
Thu Aug 29 13:15:59 2024 (11178): received TOP107 Liquid_Type: 0
Thu Aug 29 13:15:59 2024 (11185): received TOP108 Alt_External_Sensor: 0
Thu Aug 29 13:15:59 2024 (11190): received TOP109 Anti_Freeze_Mode: 0
Thu Aug 29 13:15:59 2024 (11197): received TOP110 Optional_PCB: 0
Thu Augeived TOP115 Water_Pressure: -0.02
Thu Aug 29 13:15:59 2024 (11234): received TOP116 Second_Inlet_Temp: -128
Thu Aug 29 13:15:59 2024 (11240): received TOP117 Economizer_Outlet_Temp: -128
Thu Aug 29 13:15:59 2024 (11248): received TOP118 Second_Room_Thermostat_Temp: -128
Thu Aug 29 13:16:03 2024 (15002): Starting initial wifi scan ...
Thu Aug 29 13:16:08 2024 (20002): Requesting new 1wire temperatures

TOP111 to TOP115 are missing here. Never seen bevore.

@CurlyMoo
Copy link
Author

CurlyMoo commented Aug 29, 2024

Let's zoom out a bit. Until now we have been suspecting the NTP additions are the ones that are faulty, but i would like to focus first on correct functioning of the check_wifi function.

@McMagellan @stumbaumr can you guys monitor in the difference tests how long it takes for the wifi to connect, how long it takes for the first ntp sync message to appear in the log, and how long it takes before the time is set correctly?

@CurlyMoo
Copy link
Author

You can clearly see it in the last log 29bputty.log. Until the reboot/crash there is just no wifi connection. Wifi stays at -1 in the stats. As soon as it get a wifi connection after the reboot the time is synced immediatly and the wifi is at 100%.

So focussing on the ntp sync without a proper wifi connection is useless.

@McMagellan
Copy link

Here is the information from the system log of the two Fritz boxes after power on at 12:35 (time stamp from the start of the Putty file). Sorry it's German.

Systemlog Fritzbox1 DSL.pdf
Systemlog Fritzbox2 Mesh.pdf

The first message in the log is around 12:37, although it is unclear to me where the Fritzbox got this time because no ntp sync could take place yet. Maybe from the DSL Training Start Message.

The first WiFi device logs on to the network at 12:37:08.

At 12:40:10 DSL IP V4 was connected to Fritzbox1

At 12:40:11 the message about successful time synchronization comes after two unsuccessful attempts.

Heishamon only logged on to the network at 13:15:56 after the manual reboot with the IP address 192.168.178.36.

@IgorYbema
Copy link
Owner

The reason why it didn't connect to wifi is, I believe, due to that you have connected to the AP:

Thu Jan  1 01:00:37 1970 (37546): WiFi lost, but softAP station connecting, so stop trying to connect to configured ssid...
New webserver client: 192.168.4.100:59214

However, it should not report this message after that:
Thu Jan 1 01:00:45 1970 (45637): Reconnecting to WiFi failed. Waiting a few seconds before trying again.
I think that last one is just a bug. I'll check that. But remember that it will not retry to connect to the wifi if a client is connected (192.168.4.*) to the setup AP.

I'll check the NTP code

@IgorYbema
Copy link
Owner

IgorYbema commented Aug 29, 2024

In the logs by @McMagellan it works fine everytime. @IgorYbema The only thing that confuses me is that in the 26dputty.log it started its first sync after 17 minutes. However, @McMagellan could connect to the webserver in the meanwhile. So @IgorYbema either i put the first sync in the wrong spot or in the check_wifi function it doesn't enter this line right:

if (firstConnectSinceBoot) { // this should start only when softap is down or else it will not work properly so run after the routine to disable softap

This is all correct. It started syncing just after the first wifi reconnect (after 17 mins):

Thu Jan  1 01:17:19 1970 (1039076): WiFi (re)connected, shutting down hotspot...
1039187: Syncing with ntp servers, check again in 5 minutes
Mon Aug 26 18:55:33 2024 (1040090): Requesting new 1wire temperatures

"Could connect to the webserver" does that maybe mean using the AP 192.168.4.1? As said, this causes the heishamon not to retry connecting to the main wifi as it thinks the user is configuring the heishamon.

@McMagellan
Copy link

No, there was knowingly no connection from the laptop or desktop to the AP.
Is there perhaps a device that basically scans the WiFi environment and connects unnoticed?

Or does Heishamon connect with itself? See extraction from last Logfile. First line is the timestamp- info cause the webserver entries have no timestamp, whatever.

Thu Jan  1 01:00:31 1970 (31228): Checksum and header received ok!
Thu Jan  1 01:00:37 1970 (37546): WiFi lost, but softAP station connecting, so stop trying to connect to configured ssid...
New webserver client: 192.168.4.100:59214

Thu Jan  1 01:01:07 1970 (67121): Calculated Watt on S0 port 2: 0
Timeout webserver client: 192.168.4.100:59214
Closing webserver client: 192.168.4.100:59214

Thu Jan  1 01:03:31 1970 (211228): Checksum and header received ok!
New webserver client: 192.168.4.100:59215

Thu Jan  1 01:04:01 1970 (241229): received TOP6 Main_Outlet_Temp: 25.25
Timeout webserver client: 192.168.4.100:59215
Closing webserver client: 192.168.4.100:59215

Thu Jan  1 01:06:31 1970 (391227): Checksum and header received ok!
New webserver client: 192.168.4.100:59218
Closing webserver client: 192.168.4.100:59218
New webserver client: 192.168.4.100:59221
Closing webserver client: 192.168.4.100:59221

Thu Jan  1 01:06:40 1970 (400807): sent bytes: 111 including checksum value: 18 
New webserver client: 192.168.4.100:59224
Closing webserver client: 192.168.4.100:59224

Thu Jan  1 01:06:45 1970 (405773): Reconnecting to WiFi failed. Waiting a few seconds before trying again.
New webserver client: 192.168.4.100:59227
Closing webserver client: 192.168.4.100:59227

Thu Jan  1 01:18:31 1970 (1111323): Checksum and header received ok!
New webserver client: 192.168.4.100:59231
Closing webserver client: 192.168.4.100:59231
New webserver client: 192.168.4.100:59234
Closing webserver client: 192.168.4.100:59234

Thu Jan  1 01:18:41 1970 (1121323): Checksum and header received ok!
New webserver client: 192.168.4.100:59237
Closing webserver client: 192.168.4.100:59237
Thu Jan  1 01:18:45 1970 (1125837): Reconnecting to WiFi failed. Waiting a few seconds before trying again.
New webserver client: 192.168.4.100:59240
Closing webserver client: 192.168.4.100:59240

Thu Jan  1 01:27:45 1970 (1665883): Reconnecting to WiFi failed. Waiting a few seconds before trying again.
New webserver client: 192.168.4.100:59244
Closing webserver client: 192.168.4.100:59244
New webserver client: 192.168.4.100:59247
Closing webserver client: 192.168.4.100:59247

Thu Jan  1 01:27:51 1970 (1671420): Checksum and header received ok!
New webserver client: 192.168.4.100:59250
Closing webserver client: 192.168.4.100:59250
New webserver client: 192.168.4.100:59253
Closing webserver client: 192.168.4.100:59253

@IgorYbema
Copy link
Owner

Seems you do have a client connecting to it. Maybe a phone you used before and is now auto connecting?

@McMagellan
Copy link

In the past, I actually used the Heishamon AP to connect my cell phone to the normal WiFi in the settings in the event of a factory reset. I will repeat the test and first turn off my cell phone.

@McMagellan
Copy link

Feedback new Test: ntp sync.

The test was manually stopped after 20 minutes without the time being synchronized.
I turned off my phone to make sure no web server calls were coming from it.
Then I did an interim test and saw WS calls to the AP in the log again. (New webserver client: 192.168.4.100:59457)
I hadn't considered the old Lenovo on which the debug putty files are recorded and which is in the immediate vicinity.

However, all network connections were then disconnected.
Nevertheless, he made calls to the AP.
I then deactivated the WiFi adapter in the Lenovo and started the test again.

There is only one webserver call in the 20 minutes:
Thu Jan 1 01:08:31 1970 (511410): Checksum and header received ok! (only timestamp info)
514017: Syncing with ntp servers, check again in a day
New web server client: 192.168.178.97:43937
Closing web server client: 192.168.178.97:43937
Thu Jan 1 01:08:40 1970 (520051): Requesting new 1wire temperatures

This IP address is the Fritzbox2 which runs as a MESH- repeater.
For more details see the log file.
29cputty2.log

@IgorYbema
Copy link
Owner

It reconnected ok. The time sync failed. Was DSL trained already?

hu Jan 1 01:03:33 1970 (214019): WiFi (re)connected, shutting down hotspot... 214130: Syncing with ntp servers, check again in 5 minutes

@CurlyMoo
Copy link
Author

What i also find wierd is that the subsequent check is suddenly:
514017: Syncing with ntp servers, check again in a day

Whilst the year is still 1970.

@McMagellan
Copy link

McMagellan commented Aug 29, 2024

@IgorYbema:The timing in the Fritzbox is always the same. There are only minimal time differences between the network master and the repeater (MESH).
0:00 Power up
2:30 Fritzbox starts service
2:30 Training DSL start (first timestamp in Fritzbox log)
3:30 WiFi starts
5:45 DSL connected Internet
5:46 Time server received.

To me it looks like Heishamon checks whether the last network connection is available just a few seconds after starting.
If this is not the case, the AP is activated.
If devices now find the AP even without actually connecting, no attempt will be made to connect to the last network. Heishamon should determine that these connections are not real connections and then continue trying to reestablish the last connection. For me this would be possible around 3:30 after powering up the Fritzbox after the WLAN went into operation. However, the correct time is only available after 5:46.

@IgorYbema
Copy link
Owner

So Heishamon reconnected at 3:33 to the wifi. No not at that time. Retry at 8:33 failed to sync. Thanks. I'll try to find out why

@IgorYbema
Copy link
Owner

IgorYbema commented Aug 30, 2024

What i also find wierd is that the subsequent check is suddenly:
514017: Syncing with ntp servers, check again in a day

Whilst the year is still 1970.

So probably the year == 1970 check is wrong. That is why it flips over to next day check and doesnt run a ntpReload.

@IgorYbema
Copy link
Owner

What i also find wierd is that the subsequent check is suddenly:
514017: Syncing with ntp servers, check again in a day
Whilst the year is still 1970.

So probably the year == 1970 check is wrong. That is why it flips over to next day check and doesnt run a ntpReload.

tm_year = years since 1900...

Patching this stupid one

@IgorYbema
Copy link
Owner

@CurlyMoo
Copy link
Author

It again proves the need of reviews.

logprintln_P(F("Syncing with ntp servers, check again in 5 minutes"));
timerqueue_insert(300, 0, -6);
logprintln_P(F("Resynced with NTP servers. Next sync after 24 hours."));
timerqueue_insert(86400, 0, -5);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, you're 100% sure that the time cannot become bogus again and reset to it's defaults.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason outside a reboot that the time will get bogus indeed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if this fixes this unnecessary sync from the logs from @McMagellan

Fri Aug 30 19:53:24 2024 (751365): Received 203 bytes data
Fri Aug 30 19:53:24 2024 (751366): Checksum and header received ok!
Fri Aug 30 19:53:24 2024 (751368): received TOP49 Main_Hex_Outlet_Temp: 23
754921: Syncing with ntp servers, check again in 5 minutes

@McMagellan
Copy link

Feedback Egyras#587 ntp sync. Signature Alpha b24fab5

I ran 2 tests of 20 minutes each and in both cases the time was synchronized correctly after about 7 minutes.
The second test was with a cell phone and the WiFi adapter was active again on the Lenovo. There were no more web server connections.
30bputty.log
30cputty.log

Here too, when I was looking through the log file 30bputty.log, I noticed a hole in the MQTT bulk message.
Line 179: TOP110 to TOP115 are missing.
Line 451: TOP109 to TOP113 are missing
We've had this happen in the last few days.

@IgorYbema
Copy link
Owner

Here too, when I was looking through the log file 30bputty.log, I noticed a hole in the MQTT bulk message. Line 179: TOP110 to TOP115 are missing. Line 451: TOP109 to TOP113 are missing We've had this happen in the last few days.

This is probably just caused by some overloading of the log lines, busy mqtt server or network drops. Nothing to worry about

@stumbaumr
Copy link

Lovely work!!!

After boot:
image

And then:
image

I believe this PR is done now.

@IgorYbema
Copy link
Owner

will be added to v3.7

@IgorYbema IgorYbema closed this Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants