Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check time synchronization before socket connection #195

Closed
wants to merge 1 commit into from

Conversation

joshk
Copy link
Contributor

@joshk joshk commented May 24, 2024

Similar to the recent addition of checking if the network is available before connecting the socket, this also checks if the time is synchronized.

Additionally, you can opt-out by using:

config :nerves_hub_link,
  wait_for_time_sync: false,

which is useful for tests or when running your code on your host machine (eg. Mac) where NTP isn't available and NervesTime will always report that time isn't synchronized.

fixes #191

Copy link
Contributor

@lawik lawik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it :)

@fhunleth
Copy link
Contributor

I feel like there should be a warning. If you were to put this into production and you don't have control over where the device is installed, it might prevent the device from connecting. NTP is sometimes blocked by routers. I don't know if it's a blanket UDP ban or whether some admin really doesn't like NTP, but we see it happen in our installs. NTP synchronization also takes minutes, so if something causes the device to reboot soon after boot, it might also make it trickier to recover the device.

@joshk
Copy link
Contributor Author

joshk commented May 24, 2024

That is a good point, your time can be in sync without NTP.

I do wonder if having a HTTP based NTP adapter for NervesTime might be a slight better way to go for the case you mention above. As a devices clock drifts, it will cause issues with SSL.

For this PR, would the warning be during socket connection?

@lawik
Copy link
Contributor

lawik commented May 24, 2024

Good call.

So the problem seems to have been that the client tries to connect with a clock that is out of whack and fails to do the crypto dance. I get this on every restart as noted in #191.

It isn't harmful to fail that way until it succeeds but it is kind of weird.

Would time NervesTime.synchronized? give false even if an RTC has kept it accurate? Then synchronization is the wrong thing to look at.

lib/nerves_hub_link/socket.ex Outdated Show resolved Hide resolved
lib/nerves_hub_link/socket.ex Outdated Show resolved Hide resolved
lib/nerves_hub_link/socket.ex Outdated Show resolved Hide resolved
@joshk joshk force-pushed the check-time-before-socket-connection branch from 71949fd to 262a1dc Compare May 26, 2024 06:54
@joshk joshk requested review from jjcarstens and lawik June 12, 2024 07:48
@lawik
Copy link
Contributor

lawik commented Jun 12, 2024

I wonder if it is better to just try and figure out if the time is out of whack based on the request rather than not making it before time sync. We can handle the outcome if we know that it is the problem.

@joshk
Copy link
Contributor Author

joshk commented Jun 12, 2024

what is the error you get when the time isn't synchronized yet?

@lawik
Copy link
Contributor

lawik commented Jun 13, 2024

403, see #191

@joshk
Copy link
Contributor Author

joshk commented Jun 13, 2024

Thats right. 403 doesn't really tell us anything, that is a generic error, or as Mozilla tells me...

The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it.

@lawik
Copy link
Contributor

lawik commented Jun 13, 2024

Issue has more details. It fails to establish the websocket and I bekieve the 403 is only triggered by an invalid signature. The signature is invalid because it signs a couple of hours off before NervesTime syncs up.

we might not want to wait for NervesTime sync. But if we don't we should have a warning that this could be due to time kerfuffery so people don't scratch their heads for two minutes and then change the config for their perfectly correct firmware.

@lawik
Copy link
Contributor

lawik commented Jun 13, 2024

The server could pass its perceived time as part of the 403 in a header or something. Would let us check for this case. Maybe do some backoff?

@fhunleth
Copy link
Contributor

fhunleth commented Jun 17, 2024

I'm really having trouble with this one. I don't think that I'd ever enable it or recommend anyone enable it. The reason is that it assumes failure without trying. Maybe the time is wrong, but maybe it will never be fixed because NTP is down or blocked. If the devices never try, then the only option is to replace them in the field or hope that there's another backend connection that doesn't do this. On the other hand, if the devices try, then this can be addressed on the backend by deploying a cert with a better validity range or tolerating a larger time window.

I agree with the sentiment of just fixing the error message. How about if the SSL connection fails, then check if the time has been synchronized. Same logic, but be optimistic and try rather than assume failure if unsynchronized.

Copy link
Contributor

@lawik lawik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Frank's note. I don't think we should push this time stuff through.

I think we can make the dev aware of the 403 problem and why it might happen and they should have less of that when the ping endpoints exist.

For more serious cases this might be causing more trouble than worth.

@joshk
Copy link
Contributor Author

joshk commented Jun 17, 2024

Thanks for the feedback @fhunleth

I'll look into the changes you mentioned.

One thing to note is that these 403 errors due to time synchronization push the Socket retry backoff to hit its max limit. This is a similar issue we saw several months ago related to the Shared Secrets work, and likely means there is some work to be done there (like resetting the backoff after a successful connection)

@lawik
Copy link
Contributor

lawik commented Sep 3, 2024

We've done work to mitigate this in another issue where we were using System.system_time instead of System.os_time and the system time would take a long time to update while OS time was more reasonable.

I think we should change the warning but this thread can be closed.

@lawik lawik closed this Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Link fails with 403 repeatedly until it eventually recovers
4 participants