Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bindings: clean up blinding tests #4356

Merged
merged 3 commits into from
Jan 17, 2024
Merged

Conversation

lrstewart
Copy link
Contributor

@lrstewart lrstewart commented Jan 10, 2024

Resolved issues:

Resolves #3976

Description of changes:

handshake_error_with_blinding sometimes failed. The sequence of events was:

  • s2n_recv failed as intended, triggering blinding.
  • shutdown waited 10-30 seconds for the blinding to expire.
  • shutdown tried to read the close_notify message but failed because of Fix s2n_shutdown + failed recv bug #4350. That failure triggered blinding again.
  • shutdown waited 10-30 seconds for the blinding to expire.

handshake_error_with_blinding calls shutdown twice, once with a timeout of 10s and once with a timeout of 30s. That gives shutdown a total of 40s to complete both blindings. I'm honestly surprised it doesn't fail more often :)

I've rewritten the tests to be clearer and less reliant on timeouts. Each test now specifically times the execution of s2n_shutdown and makes assertions about its execution time and result. I'm also specifically checking for the correct errors to ensure we're not overlooking more issues like #4350.

I could also check that the time_elapsed doesn't exceed some upper bound (40s maybe? max blinding + safety margin for actual work), but I worry that has the potential to continue to be flaky if the environment introduces some delay. I think the combo of checking error types + testing both sources of blinding separately makes me pretty confident.

Callouts:

This PR cleans up the tests, but does not fix #4350. They're really two different problems, so I'd like to fix them separately and not split reviewers' attention.

Testing:

This change is nothing but tests.
I'm having trouble consistently reproing the flaky test failure in CI, but it failed frequently on my mac. After this change, I have seen zero failures on my mac, even after leaving it running in a loop.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions github-actions bot added the s2n-core team label Jan 10, 2024
@lrstewart lrstewart marked this pull request as ready for review January 10, 2024 22:42
// write the close_notify message that the server needs.
// Because time is mocked for testing, this does not actually take 10 minutes.
// TODO: replace this with a half-close once the bindings support half-close.
_ = time::timeout(time::Duration::from_secs(600), client.shutdown()).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so that things are in a known state, could we assert on the expected result here? Also why 600 seconds?

Copy link
Contributor Author

@lrstewart lrstewart Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why 600 seconds?

Just arbitrary wild overkill. If sending one messages takes more than 10min, you either have deeper problems or your environment is so broken as to be useless :) The mocked time makes wild overkill cheap.

@lrstewart lrstewart enabled auto-merge (squash) January 17, 2024 22:18
@lrstewart lrstewart merged commit d573ce2 into aws:main Jan 17, 2024
29 checks passed
@lrstewart lrstewart deleted the blinding_tests branch January 17, 2024 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flaky test: handshake_error_with_blinding, rust bindings
3 participants