Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for low performance HA servers #555

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

Lurker00
Copy link

I'm yet to find time to update to HA Core 2025.x, and to incorporate your many changes to LocalTuya :(

Meanwhile, I ran many tests with low performance HA server, and, as the result, these are related fixes. I suggest to read commit by commit: they all should be understandable. Yes, I saw all the fixed misbehaviors in real life!

Yes, I saw successful connects after 4.5 seconds. But, regardless, making TIMEOUT_CONNECT (was 3) less than TIMEOUT_REPLY has not much sense: usually connect is even longer process.

I've changed HEARTBEAT_INTERVAL to 8.3 because 83 is a big enough prime number. Making periodic tasks with periods of prime numbers decrease the chance of "crowded" periodic events across the whole system.

Copy link
Owner

@xZetsubou xZetsubou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR as always <3.

custom_components/localtuya/core/pytuya/__init__.py Outdated Show resolved Hide resolved
custom_components/localtuya/core/pytuya/__init__.py Outdated Show resolved Hide resolved
custom_components/localtuya/core/pytuya/__init__.py Outdated Show resolved Hide resolved
custom_components/localtuya/core/pytuya/__init__.py Outdated Show resolved Hide resolved
Comment on lines +1214 to +1217
# Avoid empty state parsing!
if self._sub_devs_query_task is not None:
self._sub_devs_query_task.cancel()
self._sub_devs_query_task = None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would this fix? it seems in wrong place subdevices_query is request method, it shouldn't interfere with tasks.

How are we sure that the task is handling empty list or not, If we want avoid empty state shouldn't we do that in def _msg_subdevs_query method instead and avoid cancel/create new tasks if states are empty.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my version, it has one more line:

        # Avoid empty state parsing!
        if self._sub_devs_query_task is not None:
            self._sub_devs_query_task.cancel()
            self._sub_devs_query_task = None
            self.info("Empty list parsing prevented")

and I saw this record in the logs.

The next statement resets self.sub_devices_states. self._sub_devs_query_task is the task with _action() method that sleeps 2 seconds at the beginning. Sometimes, especially when a gateway produces the offline/online lists as several replies, it happens that _action() execution is delayed so that it finally parses already empty self.sub_devices_states, reporting all the sub-devices as absent.

Copy link
Owner

@xZetsubou xZetsubou Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To optimize this we should check the lists before creating the task. so in line 897

on_devs, off_devs = data.get("online", []), data.get("offline", [])

before create the action task:

            on_devs, off_devs = data.get("online", []), data.get("offline", [])
            if not on_devs and not off_devs:
                return

and we already have check for lists in _action()

                if listener is None or (on_devs is None and off_devs is None):
                    return

Either way, I don't mind it since I find it won't affect anything since we have check on action, however If we change the logic of parsing query msg, we should also remember this.

Copy link
Owner

@xZetsubou xZetsubou Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _action() need to change it to not instead of none check to detect empty lists. this will avoid empty parsing empty lists and action will directly stopped.

                # if listener is None or (on_devs is None and off_devs is None):
                if listener is None or (not on_devs and not off_devs):
                    return

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You see, empty on_devs and off_devs lists is a valid case, when the gateway itself is connected to LocalTuya (all my GWs are), and all sub-devices were detached (just a couple days ago I moved all BLE bulbs from one GW to another). So, your suggestion would break detection of absent sub-devices. Instead, in that my recent case it worked just fine!

By the way, the check (on_devs is None and off_devs is None) is redundant because it never happens by the code logic. It was possible a long ago, when it was presumed that one subdev_online_stat_query request results in one reply. But it does not harm.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it makes more sense in the future that we can implement better way to handle the stat_query messages instead of the current way it 2sec sleep, the current logic actually weren't made for stat_query request to be used as heart_beat

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it makes more sense in the future that we can implement better way to handle the stat_query messages instead of the current way it 2sec sleep

I'm running such implementation since I've opened this PR. So far so good. This will be the next PR, but it would require some time because a lot of changes in the current coordinator.py compared to my outdated version. Also I have an idea to change the logic of the heartbeat, to eliminate waiting for replies with a timeout (=less async tasks), but, most probably, I'll do it even later, step by step.

custom_components/localtuya/core/pytuya/__init__.py Outdated Show resolved Hide resolved
@Lurker00
Copy link
Author

Lurker00 commented Feb 19, 2025

Just FYI, yesterday in my home a source of 2.4GHz interference appeared, leading to high packet loss via WiFi connections. The interference is not continuous: there are some seconds pulses every some minutes. I'm logging calculated delays, if they are big:

                delta = (time.monotonic() - start) - HEARTBEAT_INTERVAL
                if delta > (11-HEARTBEAT_INTERVAL):
                    self.info(f"Delta {delta} fails {fail_attempt}")
                delta = max(0, min(delta, HEARTBEAT_INTERVAL))

Here is extraction from my logs:

2025-02-19 11:52:08.701 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bf5...wrf - ВС: Wired Zigbee gateway] Delta 5.743602296034805 fails 0
2025-02-19 11:52:08.721 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bfa...hkb - Iot Fan Tower T1] Delta 8.280447005107998 fails 0
2025-02-19 11:52:08.723 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bf1...una - К: BLE Шлюз GW003] Delta 6.5229466349584975 fails 0
2025-02-19 11:52:08.723 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bf8...tpj - ГЛ4: LED BULB W5K] Delta 8.24631225997582 fails 0
2025-02-19 11:52:08.791 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bf7...dvp - ГЛ0: умная лампочка] Delta 6.28164764009416 fails 0
2025-02-19 11:52:08.792 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bfc...hfr - ГЛ3: LED BULB W5K] Delta 6.590917618991806 fails 0
2025-02-19 11:52:08.792 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bf0...sti - ГЛ2: LED BULB W5K] Delta 6.475475615053437 fails 0
2025-02-19 11:52:08.793 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bfc...9vr - ГЛ1: LED BULB W5K] Delta 3.2039450939511873 fails 0
2025-02-19 11:52:09.385 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bf1...sa9 - ВС: Питание HA] Delta 4.636976286931894 fails 0
2025-02-19 11:52:09.386 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bf0...vcr - ВС: Питание шлюза Zigbee] Delta 4.948057062970474 fails 0
2025-02-19 11:52:09.387 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bfd...ce8 - ВС: Питание HMG-01] Delta 5.449782188097014 fails 0

"fails 0" means no timeout waiting for a response, even when the delay > 5 seconds. But, if it was not the introduced sleep time correction, some devices might close connections as not active. Now all work just fine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants