-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for low performance HA servers #555
base: master
Are you sure you want to change the base?
Fixes for low performance HA servers #555
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR as always <3.
# Avoid empty state parsing! | ||
if self._sub_devs_query_task is not None: | ||
self._sub_devs_query_task.cancel() | ||
self._sub_devs_query_task = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would this fix? it seems in wrong place subdevices_query
is request method, it shouldn't interfere with tasks.
How are we sure that the task is handling empty list or not, If we want avoid empty state shouldn't we do that in def _msg_subdevs_query
method instead and avoid cancel/create new tasks if states are empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my version, it has one more line:
# Avoid empty state parsing!
if self._sub_devs_query_task is not None:
self._sub_devs_query_task.cancel()
self._sub_devs_query_task = None
self.info("Empty list parsing prevented")
and I saw this record in the logs.
The next statement resets self.sub_devices_states
. self._sub_devs_query_task
is the task with _action()
method that sleeps 2 seconds at the beginning. Sometimes, especially when a gateway produces the offline/online lists as several replies, it happens that _action()
execution is delayed so that it finally parses already empty self.sub_devices_states
, reporting all the sub-devices as absent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To optimize this we should check the lists before creating the task. so in line 897
on_devs, off_devs = data.get("online", []), data.get("offline", [])
before create the action task:
on_devs, off_devs = data.get("online", []), data.get("offline", [])
if not on_devs and not off_devs:
return
and we already have check for lists in _action()
if listener is None or (on_devs is None and off_devs is None):
return
Either way, I don't mind it since I find it won't affect anything since we have check on action, however If we change the logic of parsing query msg, we should also remember this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In _action()
need to change it to not
instead of none check to detect empty lists. this will avoid empty parsing empty lists and action will directly stopped.
# if listener is None or (on_devs is None and off_devs is None):
if listener is None or (not on_devs and not off_devs):
return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You see, empty on_devs
and off_devs
lists is a valid case, when the gateway itself is connected to LocalTuya (all my GWs are), and all sub-devices were detached (just a couple days ago I moved all BLE bulbs from one GW to another). So, your suggestion would break detection of absent sub-devices. Instead, in that my recent case it worked just fine!
By the way, the check (on_devs is None and off_devs is None)
is redundant because it never happens by the code logic. It was possible a long ago, when it was presumed that one subdev_online_stat_query
request results in one reply. But it does not harm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it makes more sense in the future that we can implement better way to handle the stat_query messages instead of the current way it 2sec sleep, the current logic actually weren't made for stat_query request to be used as heart_beat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it makes more sense in the future that we can implement better way to handle the stat_query messages instead of the current way it 2sec sleep
I'm running such implementation since I've opened this PR. So far so good. This will be the next PR, but it would require some time because a lot of changes in the current coordinator.py
compared to my outdated version. Also I have an idea to change the logic of the heartbeat, to eliminate waiting for replies with a timeout (=less async tasks), but, most probably, I'll do it even later, step by step.
Co-authored-by: Bander <[email protected]>
Co-authored-by: Bander <[email protected]>
Co-authored-by: Bander <[email protected]>
Just FYI, yesterday in my home a source of 2.4GHz interference appeared, leading to high packet loss via WiFi connections. The interference is not continuous: there are some seconds pulses every some minutes. I'm logging calculated delays, if they are big: delta = (time.monotonic() - start) - HEARTBEAT_INTERVAL
if delta > (11-HEARTBEAT_INTERVAL):
self.info(f"Delta {delta} fails {fail_attempt}")
delta = max(0, min(delta, HEARTBEAT_INTERVAL)) Here is extraction from my logs:
"fails 0" means no timeout waiting for a response, even when the delay > 5 seconds. But, if it was not the introduced sleep time correction, some devices might close connections as not active. Now all work just fine! |
I'm yet to find time to update to HA Core 2025.x, and to incorporate your many changes to LocalTuya :(
Meanwhile, I ran many tests with low performance HA server, and, as the result, these are related fixes. I suggest to read commit by commit: they all should be understandable. Yes, I saw all the fixed misbehaviors in real life!
Yes, I saw successful connects after 4.5 seconds. But, regardless, making
TIMEOUT_CONNECT
(was 3) less thanTIMEOUT_REPLY
has not much sense: usually connect is even longer process.I've changed
HEARTBEAT_INTERVAL
to 8.3 because 83 is a big enough prime number. Making periodic tasks with periods of prime numbers decrease the chance of "crowded" periodic events across the whole system.