Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reconnection #19

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

bammab
Copy link

@bammab bammab commented Oct 26, 2024

Hi,

as some other people I had a problem with the EDL21 component of homeassistant (home-assistant/core#98409). After some time (random) no data are available anymore. The (info) log will show a lot of reconnection, but nothing happen. One way to fix it, is to restart the hole homeassistant.
I spent some time to find out the reason. One indicator was, that the led on my serial adapter was still active. So the serial port must be opened. Then I had created a custom component with a local copy of all libraries and add some debug messages to be able to follow the code.
Now I have simple removed one statement, which remove the reference to the transport before it is closed. Another idea was to close the connection before this line, if the exc parameter was a TimeoutError. But then the _disconnect function will never abort the transport and I used this way, now. Please feel free to comment and will try to adjust the path as fast as possible. Hope to get rid of this neasty problem ;-)

If the watchdog trigger connection_lost and the serial connection if already open, it was not possible to close/abort the connection.
Old:

  • Watchdog timout
  • connection_lost (transport set to None)
  • _reconnect
    • _disconnect (which could not abort, because transport not set anymore)
    • _create_connection Call pyserial-asyncio: create_serial_connection -> _ensure_reader -> _read_ready -> _close -> connection_lost (end here, because of lock)

New workflow could abort the transport and the new connection will work. Anyway connection_lost will be called twice:

  1. by the watchdog (with TimeoutError)
  2. by _reconnect -> _disconnect -> abort

The last one will do nothing more, because of the lock of _reconnect.

If the watchdog trigger connection_lost and the serial connection if already
open, it is not possible to close/abort the connection.
Old:
- Watchdog timout
- connection_lost (transport set to None)
- _reconnect
  - _disconnect (which could not abort, because transport not set anymore)
  - _create_connection
    Call pyserial-asyncio: create_serial_connection -> _ensure_reader -> _read_ready -> _close -> connection_lost (end here, because of lock)

New workflow could abort the transport and the new connection will work.
Anyway connection_lost will be called twice:
1. by the watchdog (with TimeoutError)
2. by _reconnect -> _disconnect -> abort
The last one will do nothing more, because of the lock of _reconnect
@mtdcr
Copy link
Owner

mtdcr commented Oct 27, 2024

Hi @bammab,

thank you very much for your investigation! I don't use the serial interface anymore, so I can't test any related changes myself. It's also a piece of code I'm not very proud of. However, the line you're removing was introduced to fix another problem (see commit a99aec1 and home-assistant/core#84153). Could you please try to find a way that covers both problems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants