-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle PONG messages before closing connection due to heartbeat timeout #10544
base: master
Are you sure you want to change the base?
Conversation
Currently, if receive is never called, PONG messages aren't handled and aiohttp closes the connection for missing PONG after half of the heartbeat time has passed. This commit changes the code that closes the connection for missing PONG so that instead of immediately closing the connection, it checks whether a PONG has been sent. If a PONG has been sent, the connection doesn't get closed.
Handle exceptions in pong_not_received_coro() analogous to receive()
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Added missing contributor name
to make mypy happy
CodSpeed Performance ReportMerging #10544 will not alter performanceComparing Summary
|
to make mypy happy
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #10544 +/- ##
==========================================
- Coverage 98.70% 98.65% -0.05%
==========================================
Files 122 122
Lines 37230 37262 +32
Branches 2064 2068 +4
==========================================
+ Hits 36748 36762 +14
- Misses 335 350 +15
- Partials 147 150 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
async with async_timeout.timeout(self._pong_heartbeat / 10.0): | ||
msg = await reader.read() | ||
self._reset_heartbeat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is adding a race condition where we would violate the concurrent calls to reading requirement:
Line 520 in 45b861f
raise RuntimeError("Concurrent call to receive() is not allowed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be solved by the latest commit.
If self._waiting is set, we already are in the receive loop reading from the reader and would have read a PONG if one was already there. This avoids concurrent reads on the reader.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still the possibility of concurrent reads with this implementation.
However there is also another problem as reading here can consume any message type which means a call to receive after this may find their the message they are looking for missing because it was read and discarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear this would have a noticeable benefit to the user. They are likely to still get the same result regardless, this just seems to catch a (probably rare) edge case that might reduce the frequency of connection closes very slightly.
assert reader is not None | ||
try: | ||
async with async_timeout.timeout(self._pong_heartbeat / 10.0): | ||
msg = await reader.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for a user to catch the exception or anything and avoid the connection close? If it is, then this would break their code, as we've now read a message that they should be processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the same point as #10544 (review)
self._reset_heartbeat() | ||
if msg.type is not WSMsgType.PONG: | ||
ws_logger.warning( | ||
f"Received {msg} while waiting for PONG. It seems like you haven't called `receive` within {self._pong_heartbeat} seconds." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't exactly accurate. The user's code could be reading a message with receive() every second, but if there are more than pong_heartbeat messages between PONGs, then it would still get tripped up here.
I suspect the complexity of this change and the likelihood of it breaking something, combined with it likely only making a very slight (if any) reduction in closes will probably mean this won't get merged. As mentioned in the linked discussion, a doc PR better explaining these limitations and a code sample to avoid them if memory issues are not a concern would be appreciated instead. |
Maybe there's some confusion here, but this description is not correct. I would describe the change as: If code has failed to call receive() in a timely manner and a PONG message is at the top of the buffer, the connection will no longer be closed. If code is calling receive() normally, then this change likely has zero impact. Either the receive() call is happening frequently enough that it processes the PONG messages and this code is never run, or the receive() call is too slow in processing lots of messages in which case this code will likely receive another non-PONG message when running and still close the connection (the chances that the next message just happens to be a PONG when overloaded seems extremely slim unless the user is doing something odd). |
What do these changes do?
Currently, if receive is never called (or not in time), PONG messages aren't handled and aiohttp closes the connection for missing PONG after half of the heartbeat time has passed.
This commit changes the code that closes the connection for missing PONG so that instead of immediately closing the connection, it checks whether a PONG has been sent. If a PONG has been sent, the connection doesn't get closed.
Are there changes in behavior for the user?
The heartbeat mechanism is supposed to help identify if a websocket connection is still alive. When a PING message isn't responded to by a PONG message, the connection is closed. Websocket clients such as Web Browsers implement the heartbeat mechanism as well and for that reason, enabling heartbeat when using aiohttp can also be used to keep a websocket connection alive.
The current implementation of the heartbeat mechanism in aiohttp relies on the user to regularly call the
receive()
function in order to handle incoming PONG messages. If the user fails to callreceive()
within the given time frame, the connection gets closed by aiohttp. So when heartbeat in aiohttp is enabled in order to keep a websocket alive, it can actually cause the opposite, i.e. closing the websocket. Specifically, this happens for unidirectional communication via websockets (in our case a video stream).The user will not notice any different behavior for "regular" use of websockets which used to work without issues. The user will only notice a change in behavior if originally their websocket connection was closed by aiohttp prematurely due to missing PONG even though as PONG has been sent by the other side.
If the user is calling
receive()
too late, e.g., a long running task preventsreceive()
from being called in time, the original implementation closed the websocket. The new implementation from this PR won't close the websocket immediately but will log a warning that a non-PONG message was received.Is it a substantial burden for the maintainers to support this?
It's not a very big change so the burden should be fairly small.
Related issue number
There's no specific issue for this PR but the following issues are somewhat related:
#7508
#7320
Checklist
CONTRIBUTORS.txt
CHANGES/
foldername it
<issue_or_pr_num>.<type>.rst
(e.g.588.bugfix.rst
)if you don't have an issue number, change it to the pull request
number after creating the PR
.bugfix
: A bug fix for something the maintainers deemed animproper undesired behavior that got corrected to match
pre-agreed expectations.
.feature
: A new behavior, public APIs. That sort of stuff..deprecation
: A declaration of future API removals and breakingchanges in behavior.
.breaking
: When something public is removed in a breaking way.Could be deprecated in an earlier release.
.doc
: Notable updates to the documentation structure or buildprocess.
.packaging
: Notes for downstreams about unobvious side effectsand tooling. Changes in the test invocation considerations and
runtime assumptions.
.contrib
: Stuff that affects the contributor experience. e.g.Running tests, building the docs, setting up the development
environment.
.misc
: Changes that are hard to assign to any of the abovecategories.
Make sure to use full sentences with correct case and punctuation,
for example:
Use the past tense or the present tense a non-imperative mood,
referring to what's changed compared to the last released version
of this project.