Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix network.erl race conditions #1487

Merged
merged 3 commits into from
Feb 11, 2025
Merged

Conversation

petermm
Copy link
Contributor

@petermm petermm commented Jan 22, 2025

As evidenced by failing simtest CI on release-0.6 etc.

  1. startup should use continue/handle_continue and avoid races + cleaner DRY code.

  2. network:stop was using nonblocking call to stop network_port- so rapid stop/start would give the last network:start an old whereis(network_port) that was still shutting down, and lead to errors.

  3. somewhat unrelated: sta_rssi was using get_port(), and would crash on no network started - guarded it erlang side, added test.

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

Copy link
Contributor

@arpunk arpunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick

Copy link
Collaborator

@UncleGrumpy UncleGrumpy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks for spotting my error in sta_rssi/0! Of course I should not have stated a new network_port if it was not yet open.

@petermm petermm force-pushed the fix-network-races branch 2 times, most recently from b51790d to 03dab59 Compare January 23, 2025 13:19
@petermm
Copy link
Contributor Author

petermm commented Jan 24, 2025

This should be good to go.

Of note there was an additional crash as sta_rssi() is unsafe to call before connection is made, and calling it instantly after network:start would driver crash (port is up but network is not connected) - thus extra guard in the driver.

I fully understand that some supervisor should be introduced, and some contract decided upon eg. if network port ever disappears unexpected, or doesn't shutdown, exit and restart should happen?.

But this a bugfix PR against release-0.6. After this one we can look at making 0.7 beautiful;-)

Copy link
Collaborator

@bettio bettio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All is good, let's add a line to the fixed section of our changelog.

Cleaner, DRY code - and avoid any potential races.

Signed-off-by: Peter M <[email protected]>
Calling network:sta_rssi() would crash if network was not started. And just if network_port was started but not connected yet.

Handle this and return error tuple, and added tests for both scenarios.

Signed-off-by: Peter M <[email protected]>
network:stop through terminate was using nonblocking call to stop network_port- so rapid stop/start would give the second network:start an old whereis(network_port) that was still shutting down, and lead to errors.

Now waits for port DOWN.

Signed-off-by: Peter M <[email protected]>
@bettio bettio merged commit 92d10b4 into atomvm:release-0.6 Feb 11, 2025
96 of 99 checks passed
@petermm petermm deleted the fix-network-races branch February 14, 2025 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants