-
Notifications
You must be signed in to change notification settings - Fork 297
perf: Update the Node Repair Controller for requeue time #2286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf: Update the Node Repair Controller for requeue time #2286
Conversation
Pull Request Test Coverage Report for Build 15435053326Details
💛 - Coveralls |
/assign @jonathan-innis |
For(&corev1.Node{}, builder.WithPredicates(nodeutils.IsManagedPredicateFuncs(c.cloudProvider))). | ||
For(&corev1.Node{}, builder.WithPredicates(nodeutils.IsManagedPredicateFuncs(c.cloudProvider), predicate.Funcs{ | ||
UpdateFunc: func(e event.UpdateEvent) bool { | ||
return !equality.Semantic.DeepEqual(e.ObjectOld.(*corev1.Node).Status.Conditions, e.ObjectNew.(*corev1.Node).Status.Conditions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just sanity check this real quick for how this is handled with arrays, mainly just want to make sure it's looking at each of the object contents and not just looking at the items that are in the array
UpdateFunc: func(e event.UpdateEvent) bool { | ||
return !equality.Semantic.DeepEqual(e.ObjectOld.(*corev1.Node).Status.Conditions, e.ObjectNew.(*corev1.Node).Status.Conditions) | ||
}, | ||
DeleteFunc: func(e event.DeleteEvent) bool { return true }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not definining a func is true
by default. If you want to make it false
, then you should make that explicit
@@ -104,7 +111,7 @@ func (c *Controller) Reconcile(ctx context.Context, node *corev1.Node) (reconcil | |||
return reconcile.Result{}, client.IgnoreNotFound(err) | |||
} | |||
if !nodePoolHealthy { | |||
return reconcile.Result{}, c.publishNodePoolHealthEvent(ctx, node, nodeClaim, nodePoolName) | |||
return reconcile.Result{RequeueAfter: 5 * time.Minute}, c.publishNodePoolHealthEvent(ctx, node, nodeClaim, nodePoolName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This alone won't work, right? We are going to get a heartbeat event that comes in every 30s or so and that's going to requeue us more aggressively even with the event filter that you have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We either need a cache rate limiter that ensures that we literally don't check as aggressively as we are today OR we need to make sure that our event filter is a bit better so that it checks something like: The status state has actually changed
@@ -104,7 +111,7 @@ func (c *Controller) Reconcile(ctx context.Context, node *corev1.Node) (reconcil | |||
return reconcile.Result{}, client.IgnoreNotFound(err) | |||
} | |||
if !nodePoolHealthy { | |||
return reconcile.Result{}, c.publishNodePoolHealthEvent(ctx, node, nodeClaim, nodePoolName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately from just generally making sure that we don't reconcile as much as we are doing today, I do think that we should generally rate limit the cluster check that we are doing for health -- we shouldn't list as much as we are when it comes to just checking whether the cluster is healthy
96f9118
to
8fb8d49
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: engedaam The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes #N/A
Description
How was this change tested?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.