Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(testing): check if iperf server is up/reachable #482

Merged
merged 1 commit into from
Mar 24, 2025
Merged

Conversation

edipascale
Copy link
Contributor

and retry up to 10 times if not.

Fixes: #310

@edipascale edipascale force-pushed the ema/fix-310 branch 2 times, most recently from 2acb6fd to 496b452 Compare March 20, 2025 10:05
@edipascale edipascale marked this pull request as ready for review March 20, 2025 12:32
@edipascale edipascale requested a review from Frostman as a code owner March 20, 2025 12:32
@edipascale edipascale requested a review from pau-hedgehog March 20, 2025 12:33
out, err := fromSSH.RunContext(ctx, cmd)
if err != nil {
return fmt.Errorf("running iperf client: %w: %s", err, string(out))
maxRetries := 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 retries feel like really a lot... Aren't we going to hide some other issues this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too concerned about hiding other issues as we will only retry on an ssh error (or, to be more precise, an error with the "ssh:" string in it), but I can lower it to something more reasonable. Does 3 sound better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong but the retry logic wouldn't be masking Fabric issues as SSH goes through a separate network. So the retries are addressing the reliability in the test infrastructure itself

We should investigate anyhow why the server isn't ready when the client connects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the retries work, I don't think there's anything to investigate - we are giving the server 1 second of head start, but under load in a virtualized environment there is no guarantee that it will be enough, that's why I thought retries would be sensible

Fixes: #310
Signed-off-by: Emanuele Di Pascale <emanuele@githedgehog.com>
@Frostman Frostman merged commit f92da23 into master Mar 24, 2025
24 of 26 checks passed
@Frostman Frostman deleted the ema/fix-310 branch March 24, 2025 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VLAB fails to SSH to server or client test endpoint
3 participants