Handle PONG messages before closing connection due to heartbeat timeout #10544

mstegmaier · 2025-03-13T09:28:33Z

What do these changes do?

Currently, if receive is never called (or not in time), PONG messages aren't handled and aiohttp closes the connection for missing PONG after half of the heartbeat time has passed.
This commit changes the code that closes the connection for missing PONG so that instead of immediately closing the connection, it checks whether a PONG has been sent. If a PONG has been sent, the connection doesn't get closed.

Are there changes in behavior for the user?

The heartbeat mechanism is supposed to help identify if a websocket connection is still alive. When a PING message isn't responded to by a PONG message, the connection is closed. Websocket clients such as Web Browsers implement the heartbeat mechanism as well and for that reason, enabling heartbeat when using aiohttp can also be used to keep a websocket connection alive.

The current implementation of the heartbeat mechanism in aiohttp relies on the user to regularly call the receive() function in order to handle incoming PONG messages. If the user fails to call receive() within the given time frame, the connection gets closed by aiohttp. So when heartbeat in aiohttp is enabled in order to keep a websocket alive, it can actually cause the opposite, i.e. closing the websocket. Specifically, this happens for unidirectional communication via websockets (in our case a video stream).

The user will not notice any different behavior for "regular" use of websockets which used to work without issues. The user will only notice a change in behavior if originally their websocket connection was closed by aiohttp prematurely due to missing PONG even though as PONG has been sent by the other side.

If the user is calling receive() too late, e.g., a long running task prevents receive() from being called in time, the original implementation closed the websocket. The new implementation from this PR won't close the websocket immediately but will log a warning that a non-PONG message was received.

Is it a substantial burden for the maintainers to support this?

It's not a very big change so the burden should be fairly small.

Related issue number

There's no specific issue for this PR but the following issues are somewhat related:
#7508
#7320

Checklist

I think the code is well written
Unit tests for the changes exist
Documentation reflects the changes
If you provide code modification, please add yourself to CONTRIBUTORS.txt
- The format is <Name> <Surname>.
- Please keep alphabetical order, the file is sorted by names.
Add a new news fragment into the CHANGES/ folder
- name it <issue_or_pr_num>.<type>.rst (e.g. 588.bugfix.rst)
- if you don't have an issue number, change it to the pull request
  number after creating the PR
  - .bugfix: A bug fix for something the maintainers deemed an
    improper undesired behavior that got corrected to match
    pre-agreed expectations.
  - .feature: A new behavior, public APIs. That sort of stuff.
  - .deprecation: A declaration of future API removals and breaking
    changes in behavior.
  - .breaking: When something public is removed in a breaking way.
    Could be deprecated in an earlier release.
  - .doc: Notable updates to the documentation structure or build
    process.
  - .packaging: Notes for downstreams about unobvious side effects
    and tooling. Changes in the test invocation considerations and
    runtime assumptions.
  - .contrib: Stuff that affects the contributor experience. e.g.
    Running tests, building the docs, setting up the development
    environment.
  - .misc: Changes that are hard to assign to any of the above
    categories.
- Make sure to use full sentences with correct case and punctuation,
  for example:
```
Fixed issue with non-ascii contents in doctest text files
-- by :user:`contributor-gh-handle`.
```
  Use the past tense or the present tense a non-imperative mood,
  referring to what's changed compared to the last released version
  of this project.

Currently, if receive is never called, PONG messages aren't handled and aiohttp closes the connection for missing PONG after half of the heartbeat time has passed. This commit changes the code that closes the connection for missing PONG so that instead of immediately closing the connection, it checks whether a PONG has been sent. If a PONG has been sent, the connection doesn't get closed.

Handle exceptions in pong_not_received_coro() analogous to receive()

for more information, see https://pre-commit.ci

Added missing contributor name

to make mypy happy

codspeed-hq · 2025-03-13T09:44:02Z

CodSpeed Performance Report

Merging #10544 will not alter performance

_{Comparing mstegmaier:master (29a439e) with master (45b861f)}

Summary

✅ 46 untouched benchmarks

to make mypy happy

aiohttp/web_ws.py

for more information, see https://pre-commit.ci

aiohttp/web_ws.py

for more information, see https://pre-commit.ci

codecov · 2025-03-13T15:45:59Z

Codecov Report

Attention: Patch coverage is 54.54545% with 15 lines in your changes missing coverage. Please review.

Project coverage is 98.65%. Comparing base (6d205da) to head (8f3de6f).
Report is 3 commits behind head on master.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
aiohttp/web_ws.py	54.54%	12 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10544      +/-   ##
==========================================
- Coverage   98.70%   98.65%   -0.05%     
==========================================
  Files         122      122              
  Lines       37230    37262      +32     
  Branches     2064     2068       +4     
==========================================
+ Hits        36748    36762      +14     
- Misses        335      350      +15     
- Partials      147      150       +3

Flag	Coverage Δ
CI-GHA	`98.54% <54.54%> (-0.05%)`	⬇️
OS-Linux	`98.20% <54.54%> (-0.05%)`	⬇️
OS-Windows	`96.14% <54.54%> (-0.05%)`	⬇️
OS-macOS	`97.33% <54.54%> (-0.03%)`	⬇️
Py-3.10.11	`97.22% <54.54%> (-0.05%)`	⬇️
Py-3.10.16	`97.77% <54.54%> (-0.04%)`	⬇️
Py-3.11.11	`97.85% <54.54%> (-0.05%)`	⬇️
Py-3.11.9	`97.31% <54.54%> (-0.04%)`	⬇️
Py-3.12.8	`?`
Py-3.12.9	`98.31% <51.51%> (-0.05%)`	⬇️
Py-3.13.1	`?`
Py-3.13.2	`98.29% <51.51%> (-0.05%)`	⬇️
Py-3.9.13	`97.11% <54.54%> (-0.05%)`	⬇️
Py-3.9.21	`97.64% <54.54%> (-0.04%)`	⬇️
Py-pypy7.3.16	`86.63% <51.51%> (-4.98%)`	⬇️
VM-macos	`97.33% <54.54%> (-0.03%)`	⬇️
VM-ubuntu	`98.20% <54.54%> (-0.05%)`	⬇️
VM-windows	`96.14% <54.54%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bdraco · 2025-03-13T18:03:09Z

aiohttp/web_ws.py

+            async with async_timeout.timeout(self._pong_heartbeat / 10.0):
+                msg = await reader.read()
+            self._reset_heartbeat()


This change is adding a race condition where we would violate the concurrent calls to reading requirement:

aiohttp/aiohttp/web_ws.py

Line 520 in 45b861f

raise RuntimeError("Concurrent call to receive() is not allowed")

This should be solved by the latest commit.

If self._waiting is set, we already are in the receive loop reading from the reader and would have read a PONG if one was already there. This avoids concurrent reads on the reader.

for more information, see https://pre-commit.ci

bdraco

There is still the possibility of concurrent reads with this implementation.

However there is also another problem as reading here can consume any message type which means a call to receive after this may find their the message they are looking for missing because it was read and discarded.

Dreamsorcerer · 2025-03-15T16:13:41Z

aiohttp/web_ws.py

I'm not clear this would have a noticeable benefit to the user. They are likely to still get the same result regardless, this just seems to catch a (probably rare) edge case that might reduce the frequency of connection closes very slightly.

Dreamsorcerer · 2025-03-15T16:17:55Z

aiohttp/web_ws.py

+        assert reader is not None
+        try:
+            async with async_timeout.timeout(self._pong_heartbeat / 10.0):
+                msg = await reader.read()


Is it possible for a user to catch the exception or anything and avoid the connection close? If it is, then this would break their code, as we've now read a message that they should be processing.

I think this is the same point as #10544 (review)

Dreamsorcerer · 2025-03-15T16:19:11Z

aiohttp/web_ws.py

+            self._reset_heartbeat()
+            if msg.type is not WSMsgType.PONG:
+                ws_logger.warning(
+                    f"Received {msg} while waiting for PONG. It seems like you haven't called `receive` within {self._pong_heartbeat} seconds."


This isn't exactly accurate. The user's code could be reading a message with receive() every second, but if there are more than pong_heartbeat messages between PONGs, then it would still get tripped up here.

Dreamsorcerer · 2025-03-15T16:25:27Z

I suspect the complexity of this change and the likelihood of it breaking something, combined with it likely only making a very slight (if any) reduction in closes will probably mean this won't get merged.

As mentioned in the linked discussion, a doc PR better explaining these limitations and a code sample to avoid them if memory issues are not a concern would be appreciated instead.

Dreamsorcerer · 2025-03-15T16:33:19Z

it checks whether a PONG has been sent. If a PONG has been sent, the connection doesn't get closed.

Maybe there's some confusion here, but this description is not correct. I would describe the change as:

If code has failed to call receive() in a timely manner and a PONG message is at the top of the buffer, the connection will no longer be closed.

If code is calling receive() normally, then this change likely has zero impact. Either the receive() call is happening frequently enough that it processes the PONG messages and this code is never run, or the receive() call is too slow in processing lots of messages in which case this code will likely receive another non-PONG message when running and still close the connection (the chances that the next message just happens to be a PONG when overloaded seems extremely slim unless the user is doing something odd).

Michael Stegmaier and others added 6 commits March 7, 2025 10:16

Merge branch 'aio-libs:master' into master

1f0f1aa

Handle exceptions in pong_not_received_coro()

77d624d

Handle exceptions in pong_not_received_coro() analogous to receive()

Added myself to contributors

a89f86f

[pre-commit.ci] auto fixes from pre-commit.com hooks

365fdf7

for more information, see https://pre-commit.ci

Added rst file to CHANGES folder

605f600

psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Mar 13, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

48f77af

for more information, see https://pre-commit.ci

mstegmaier marked this pull request as ready for review March 13, 2025 09:35

mstegmaier requested review from webknjaz and asvetlov as code owners March 13, 2025 09:35

mstegmaier added 2 commits March 13, 2025 10:38

Update 10544.bugfix.rst

73ecb05

Added missing contributor name

Added missing return type for _pong_not_received_coro()

7768ff5

to make mypy happy

Save pong_not_received_task in a variable

8a591d0

to make mypy happy

github-advanced-security bot found potential problems Mar 13, 2025

View reviewed changes

aiohttp/web_ws.py Fixed Show fixed Hide fixed

[pre-commit.ci] auto fixes from pre-commit.com hooks

5d7c76e

for more information, see https://pre-commit.ci

github-advanced-security bot found potential problems Mar 13, 2025

View reviewed changes

aiohttp/web_ws.py Fixed Show fixed Hide fixed

mstegmaier and others added 5 commits March 13, 2025 13:48

Added callback for pong not received task

290fa31

[pre-commit.ci] auto fixes from pre-commit.com hooks

3439d1c

for more information, see https://pre-commit.ci

Fixed typo wehn -> when

85696f3

Fixed test

4d9ebf0

[pre-commit.ci] auto fixes from pre-commit.com hooks

8f3de6f

for more information, see https://pre-commit.ci

bdraco reviewed Mar 13, 2025

View reviewed changes

mstegmaier and others added 2 commits March 14, 2025 17:01

Avoid concurrent reads from reader

6e9e13f

If self._waiting is set, we already are in the receive loop reading from the reader and would have read a PONG if one was already there. This avoids concurrent reads on the reader.

[pre-commit.ci] auto fixes from pre-commit.com hooks

29a439e

for more information, see https://pre-commit.ci

bdraco requested changes Mar 14, 2025

View reviewed changes

Dreamsorcerer reviewed Mar 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Handle PONG messages before closing connection due to heartbeat timeout #10544

Handle PONG messages before closing connection due to heartbeat timeout #10544

Uh oh!

mstegmaier commented Mar 13, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 13, 2025 •

edited

Loading

Uh oh!

bdraco Mar 13, 2025

Uh oh!

mstegmaier Mar 14, 2025

Uh oh!

bdraco left a comment •

edited

Loading

Uh oh!

Dreamsorcerer Mar 15, 2025

Uh oh!

Dreamsorcerer Mar 15, 2025

Uh oh!

Dreamsorcerer Mar 15, 2025

Uh oh!

Dreamsorcerer Mar 15, 2025

Uh oh!

Dreamsorcerer commented Mar 15, 2025

Uh oh!

Dreamsorcerer commented Mar 15, 2025

Uh oh!

Uh oh!

Uh oh!

Handle PONG messages before closing connection due to heartbeat timeout #10544

Are you sure you want to change the base?

Handle PONG messages before closing connection due to heartbeat timeout #10544

Uh oh!

Conversation

mstegmaier commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do these changes do?

Are there changes in behavior for the user?

Is it a substantial burden for the maintainers to support this?

Related issue number

Checklist

Uh oh!

codspeed-hq bot commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #10544 will not alter performance

Summary

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bdraco Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

mstegmaier Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

bdraco left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dreamsorcerer Mar 15, 2025

Choose a reason for hiding this comment

Uh oh!

Dreamsorcerer Mar 15, 2025

Choose a reason for hiding this comment

Uh oh!

Dreamsorcerer Mar 15, 2025

Choose a reason for hiding this comment

Uh oh!

Dreamsorcerer Mar 15, 2025

Choose a reason for hiding this comment

Uh oh!

Dreamsorcerer commented Mar 15, 2025

Uh oh!

Dreamsorcerer commented Mar 15, 2025

Uh oh!

Uh oh!

mstegmaier commented Mar 13, 2025 •

edited

Loading

codspeed-hq bot commented Mar 13, 2025 •

edited

Loading

codecov bot commented Mar 13, 2025 •

edited

Loading

bdraco left a comment •

edited

Loading