-
Notifications
You must be signed in to change notification settings - Fork 297
feat: add nodeclaim disruption status for forceful disruptions #2151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Welcome @liafizan! |
Hi @liafizan. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Pull Request Test Coverage Report for Build 15888112882Details
💛 - Coveralls |
lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this update 🎉 This is awesome! A few comments about messaging and ensuring the patches don't accidentally overwrite data
@@ -130,6 +130,11 @@ func (c *Controller) deleteNodeClaim(ctx context.Context, nodeClaim *v1.NodeClai | |||
if !nodeClaim.DeletionTimestamp.IsZero() { | |||
return reconcile.Result{}, nil | |||
} | |||
stored := nodeClaim.DeepCopy() | |||
nodeClaim.StatusConditions().SetTrueWithReason(v1.ConditionTypeDisruptionReason, v1.DisruptionReasonUnhealthy, "node unhealthy") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense for us to have a more verbose message here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more information to status message. Please let me know if this would suffice?
@@ -130,6 +130,11 @@ func (c *Controller) deleteNodeClaim(ctx context.Context, nodeClaim *v1.NodeClai | |||
if !nodeClaim.DeletionTimestamp.IsZero() { | |||
return reconcile.Result{}, nil | |||
} | |||
stored := nodeClaim.DeepCopy() | |||
nodeClaim.StatusConditions().SetTrueWithReason(v1.ConditionTypeDisruptionReason, v1.DisruptionReasonUnhealthy, "node unhealthy") | |||
if err := c.kubeClient.Status().Patch(ctx, nodeClaim, client.MergeFrom(stored)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to optimistic lock here or it's possible that some other client can override it
@@ -78,6 +78,11 @@ func (c *Controller) Reconcile(ctx context.Context, nodeClaim *v1.NodeClaim) (re | |||
return reconcile.Result{RequeueAfter: expirationTime.Sub(c.clock.Now())}, nil | |||
} | |||
// 3. Otherwise, if the NodeClaim is expired we can forcefully expire the nodeclaim (by deleting it) | |||
stored := nodeClaim.DeepCopy() | |||
nodeClaim.StatusConditions().SetTrueWithReason(v1.ConditionTypeDisruptionReason, v1.DisruptionReasonExpired, "nodeClaim expired") | |||
if err := c.kubeClient.Status().Patch(ctx, nodeClaim, client.MergeFrom(stored)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here -- we should do an optimistic lock and handle the conflict
@@ -78,6 +78,11 @@ func (c *Controller) Reconcile(ctx context.Context, nodeClaim *v1.NodeClaim) (re | |||
return reconcile.Result{RequeueAfter: expirationTime.Sub(c.clock.Now())}, nil | |||
} | |||
// 3. Otherwise, if the NodeClaim is expired we can forcefully expire the nodeclaim (by deleting it) | |||
stored := nodeClaim.DeepCopy() | |||
nodeClaim.StatusConditions().SetTrueWithReason(v1.ConditionTypeDisruptionReason, v1.DisruptionReasonExpired, "nodeClaim expired") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here: Can we add a more descriptive message to this status condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added more info, please do let me know if you feel we need some additional information
@@ -95,7 +95,7 @@ func (c *Controller) Reconcile(ctx context.Context) (reconcile.Result, error) { | |||
if node != nil && nodeutils.GetCondition(node, corev1.NodeReady).Status == corev1.ConditionTrue { | |||
return | |||
} | |||
if err := c.kubeClient.Delete(ctx, nodeClaims[i]); err != nil { | |||
if err := c.updateStatusAndDelete(ctx, nodeClaims[i]); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could you consider doing this inline and add it above like you did the other ones -- if it fires the gocyclo linter, it's fine to disable it for the function
func (c *Controller) updateStatusAndDelete(ctx context.Context, nodeClaim *v1.NodeClaim) error { | ||
stored := nodeClaim.DeepCopy() | ||
nodeClaim.StatusConditions().SetTrueWithReason(v1.ConditionTypeDisruptionReason, v1.DisruptionReasonGarbageCollected, "nodeClaim garbage collected") | ||
if err := c.kubeClient.Status().Patch(ctx, nodeClaim, client.MergeFrom(stored)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimistic locking
@@ -116,6 +116,18 @@ func (c *Controller) Reconcile(ctx context.Context) (reconcile.Result, error) { | |||
return reconcile.Result{RequeueAfter: time.Minute * 2}, nil | |||
} | |||
|
|||
func (c *Controller) updateStatusAndDelete(ctx context.Context, nodeClaim *v1.NodeClaim) error { | |||
stored := nodeClaim.DeepCopy() | |||
nodeClaim.StatusConditions().SetTrueWithReason(v1.ConditionTypeDisruptionReason, v1.DisruptionReasonGarbageCollected, "nodeClaim garbage collected") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More descriptive message?
/assign @jonathan-innis |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cnmcavoy, liafizan The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes #2023
Description
Add disruption status to forceful disruptions so that we can improve tracking of things like how many pods disrupted for a reason
How was this change tested?
Unit test & local kwok provider
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.