-
Notifications
You must be signed in to change notification settings - Fork 26
MHC does not watch remediation #364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MHC does not watch remediation #364
Conversation
Signed-off-by: Michael Shitrit <[email protected]>
Skipping CI for Draft Pull Request. |
a8d9b74
to
b18cffa
Compare
/test ? |
@mshitrit: No presubmit jobs available for medik8s/node-healthcheck-operator@main In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test 4.17-openshift-e2e |
Export Watcher to interface to support unit tests Signed-off-by: Michael Shitrit <[email protected]>
b18cffa
to
94f2c21
Compare
/test 4.17-openshift-e2e |
1 similar comment
/test 4.17-openshift-e2e |
/test 4.16-openshift-e2e |
/retest |
/test 4.16-openshift-e2e |
4 similar comments
/test 4.16-openshift-e2e |
/test 4.16-openshift-e2e |
/test 4.16-openshift-e2e |
/test 4.16-openshift-e2e |
- Added wait period for CR to be deleted: MHC wasn't waiting for the CR deletion which failed the test because lease wasn't removed - Increased waiting for lease removal for 2 minutes: MHC will remove the lease after the CR is removed, it will requeue Reconciles until then. That requeing is causing an exponential backoff - making current 1 minute wait too short (the remediation that is triggered by the CR deletion will not trigger the lease removal because it'll short circuit since MHC will not be found as it's using the CR's name). Signed-off-by: Michael Shitrit <[email protected]>
aec2c01
to
2ac34fd
Compare
/test 4.16-openshift-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost lgtm, 2 comments inline
- add comments that were omitted in refactoring - add MHC Template Mapper function Signed-off-by: Michael Shitrit <[email protected]>
/test 4.16-openshift-e2e |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mshitrit, slintes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test 4.16-openshift-e2e |
/retest |
/test 4.16-openshift-e2e |
/retest |
2 similar comments
/retest |
/retest |
@mshitrit it looks like the MHC test still is pretty flaky, did you investigate?
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/medik8s_node-healthcheck-operator/364/pull-ci-medik8s-node-healthcheck-operator-main-4.16-openshift-e2e/1913843347529666560 |
Yeah, from what I can tell the following happens:
IMO the simplest way stabilize this (assuming we just want to modify the test) is increasing the time tolerance for leases to be deleted. |
if watchType == NHC { | ||
return owner.Kind == "NodeHealthCheck" && owner.APIVersion == remediationv1alpha1.GroupVersion.String() | ||
} else { | ||
return owner.Kind == "Machine" && owner.APIVersion == machinev1beta1.GroupVersion.String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "MachineHealthCheck" and not "Machine", we are currently trying to reconcile MHCs with Machine names... that's why tests are still flaky...
Why we need this PR
MHC does not watch the remediation CR, which in turn will cause it to miss the remediator (for example) removing the finalizer and will not trigger the lease removal and the deletion of the CR.
This issue also causes the CI e2e tests to fail (or at least be extremely flaky)
Changes made
Extracted NHC watch remediation/template to a new component
Use this component both by NHC and MHC in order to prevent code duplication
Which issue(s) this PR fixes
ECOPROJECT-2187
Test plan