-
Notifications
You must be signed in to change notification settings - Fork 22
Fix flaky e2e test #370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix flaky e2e test #370
Conversation
WalkthroughThe cleanup logic in the end-to-end test suite was refactored. Instead of a single deferred cleanup function deleting all objects sequentially with fixed delays, a single deferred cleanup still exists but now deletes objects in reverse order using asynchronous retries and verification with timeouts, replacing synchronous deletion with sleeps. Object creation also now waits for successful retrieval using retries instead of fixed sleeps. Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (2)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: slintes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-0.9 |
@slintes: once the present PR merges, I will cherry-pick it on top of In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
e2e/e2e_suite_test.go (1)
169-177
: Excellent approach to fix flaky cleanup, but consider handling already-deleted objects.The refactored cleanup logic properly addresses the flaky test issue by using individual deferred cleanup functions with retry logic. The two-phase approach (delete + verify) ensures reliable cleanup.
However, consider handling the case where objects might already be deleted:
DeferCleanup(func() { Eventually(func(g Gomega) { - g.Expect(k8sClient.Delete(context.Background(), obj)).To(Succeed()) + err := k8sClient.Delete(context.Background(), obj) + g.Expect(err).To(Or(Succeed(), MatchError(ContainSubstring("not found")))) }, 1*time.Minute, 5*time.Second).Should(Succeed()) Eventually(func(g Gomega) { err := k8sClient.Get(context.Background(), ctrl.ObjectKeyFromObject(obj), obj) g.Expect(errors.IsNotFound(err)).To(BeTrue()) }, 1*time.Minute, 5*time.Second).Should(Succeed()) })This prevents cleanup failures when objects are already deleted, making the tests more robust.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
e2e/e2e_suite_test.go
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: e2e-k8s
- GitHub Check: build-and-unit-test
/retest |
Sometimes cleanup fails due to API errors, which fails the next test as well. So retry the delete command, and wait until the resource is gone. Signed-off-by: Marc Sluiter <[email protected]>
@slintes: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a minor comment
} | ||
DeferCleanup(func() { | ||
// cleanup in reverse order! | ||
for _, obj := range []ctrl.Object{clusterRole, dummyTemplate, dummyCRD, dummyTemplateCRD} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reverse the original list programmatically? This way there will only be one copy, and we should avoid bugs if the list changes in the future.
Expect(k8sClient.Delete(context.Background(), obj)).To(Succeed()) | ||
time.Sleep(1 * time.Second) | ||
} | ||
}) | ||
for _, obj := range []ctrl.Object{dummyTemplateCRD, dummyCRD, dummyTemplate, clusterRole} { | ||
Expect(k8sClient.Create(context.Background(), obj)).To(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think it would be nicer to add the cleanup here as a function.
i.e something like:
Expect(k8sClient.Create(context.Background(), obj)).To(Succeed())
DeferCleanup(cleanUpObj, obj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because "// cleanup in reverse order!"
the CR must be cleaned up before the corresponding CRD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be executed in a reverse order, because Defer are stacked and not queued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I got an error like "unknown kind" before I switched to how it's done now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my best guess is that wrong obj was passed to the cleanup (i.e all cleanups got the same ref).
I vaguely remember go being tricky that way, if that's the case it's an easy fix.
Feel free to explore further or leave as is.
Why we need this PR
Sometimes cleanup in e2e test fails due to API errors, which fails the next test as well.
Changes made
Retry the delete command, and wait until the resource is gone.
Summary by CodeRabbit