-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to skip similar nodegroup recomputation #6926
base: master
Are you sure you want to change the base?
Add option to skip similar nodegroup recomputation #6926
Conversation
Hi @rrangith. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
32774c7
to
450a753
Compare
450a753
to
35c514d
Compare
/remove-lifecycle stale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there had been some discussions about trying to limit the new flags we are adding to the autoscaler, but i'm not sure if there was ever any guidance about that. regardless, i think this new flag should also be mentioned in the FAQ, see https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca
35c514d
to
2627751
Compare
just a question as i'm reviewing, is there any relationship or interaction between this flag and the balance-similar-node-groups flag? (eg does the latter need to be enabled or anything special like that) |
There is no strict relationship between the 2 flags for things to function properly, however if you enable I was debating mentioning that in the FAQ or cli arg description, but wasn't sure if I should add even more words to it |
i think it's worth mentioning, if only so people know they need to have |
2627751
to
0a2c91f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
we will need a review from a core maintainer for the flag change.
0a2c91f
to
3f9efa1
Compare
/test pull-cluster-autoscaler-e2e-azure-master |
newNodes int, | ||
nodeInfos map[string]*framework.NodeInfo, | ||
schedulablePodGroups map[string][]estimator.PodEquivalenceGroup, | ||
) ([]nodegroupset.ScaleUpInfo, errors.AutoscalerError) { | ||
// Recompute similar node groups in case they need to be updated | ||
similarNodeGroups := o.ComputeSimilarNodeGroups(nodeGroup, nodeInfos, schedulablePodGroups, now) | ||
similarNodeGroups := bestOption.SimilarNodeGroups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem like we're reliably populating bestOption.SimilarNodeGroups
as part of the flow before invoking this balanceScaleUps
method (L148 where we invoke o.ComputeExpansionOption
).
That said, do we want to do some checking here, to make sure we have a good default value if "skip similar nodegroup recomputation" is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think there is a good default value since my thinking with this feature was to fully rely on what the expander deems as the similar nodegroups here. So for example if the bestOption is nodegroup A, and it has similar nodegroups B and C, but the expander removes both B and C as similar nodegroups, then CA should respect that and only scaleup nodegroup A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, thx
I'm afraid I'm a bit lost here. Why should Expander modifying the options it gets is IMO very surprising and could lead to subtle bugs down the road. Expander's responsibility is to pick between pre-computed options. I also don't get the reasoning for moving the recalculation in #5802, the only cases I can think of where we'd want to recalculate is if a node group needs to be created. @BigDarkClown Do you remember the context for this one? |
@towca my thinking was that this option could be used to give more power to the expander, beyond simply picking between pre-computed options. With the gRPC expander for example, you can have any custom logic you want. So if for example I want to filter out any nodegroups that have the label "foo", I could have logic in my gRPC expander to filter that out as a best option. The problem occurs when balancing happens. Let's say I have nodegroups A and B, nodegroup A has the label "foo", and nodegroup B does not have this label. Let's say they are both similar nodegroups, so scaleups would be balanced across the 2 nodegroups. Currently (without changes from my PR), Expander can omit nodegroup A and return nodegroup B as the best option (with nodegroup A as a similar nodegroup). Then scaleups will be balanced and nodegroup A will get scaled up even though I didn't want scaleups on any nodegroup with label "foo". However if we give the power to Expander to not only control the best option, but also the best option's similar nodegroups, then we can completely block scaleups on all nodegroups with label "foo". That is what my PR will allow. And it is behind a flag so that it is completely opt-in, by default nothing will change.
Yea if #5802 was never merged, then we wouldn't need this PR. The recalculation is what is preventing the expander from having full power over the similar nodegroups, but I don't think this was the intent of the original PR |
You certainly can have any custom logic behind the RPC, but if that logic breaks some assumptions that CA makes about Expanders, it'll lead to bugs. Balancing is one example, but I'm not convinced that there aren't more, or that we don't introduce more down the road. Especially because all of the custom logic is out-of-tree, so it's hard to validate if it breaks while doing changes in-tree.
I get the motiviation, but why can't the gRPC provider just implement its own
I synced with @BigDarkClown offline, and it was actually the intent of that PR. We had a bug in our GKE-specific Expander that resulted in clearing the |
I am curious about other types of bugs that could eventually occur from this? Since the similar nodegroups is exclusively used for balancing. And by default balancing is not enabled and similar nodegroups is not used at all, see here. A user would have to opt-in to have similar nodegroups be considered at all. And then they'd have to opt-in again to skip similar nodegroup recomputation, so in my point of view the blast radius seems very small
We aren't using the gRPC provider, we use the regular cloud providers. For example the gce or aws providers. But is your general idea to have an additional processor that a user could opt-in to and then have that make a gRPC call to get the similar nodegroups for a given nodegroup? This makes sense in theory, but leads to a lot of extra network requests when in reality all we need to know is the best option's similar nodegroups. If i have 1000 nodegroups in my cluster for example, I'd now need to make 1000 gRPC calls (1 per
Yea so my argument to that would be that #5802 was a breaking change and I see value in the original behaviour. I think CA should allow users who know what they are doing to remove the defensive programming and go back to the original CA behaviour pre #5802 and allow the expander to have control over the similar nodegroups. Since ultimately this was a bug in the expander implementation and not CA. Right now the purpose of the expander is to choose the best option, but my proposal is to allow the expander to have a say on what the best option's similar nodegroups are too. Since the similar nodegroups could play a part in choosing the best option. And the best option might not actually be the best due to its similar nodegoups, so may need some extra filtering to remove certain similar nodegroups Happy to discuss more over a call or at the next SIG Autoscaling meeting. We are using this change alongside #6941 in our fork and it has been working well for us and would like others to benefit from it too |
The discussion in #6940 seems to be going in a quite different direction, so closing this PR for now. Feel free to reopen if you want to revisit this approach. /close |
@x13n: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen @x13n as I said in #6940 (comment)
I am still advocating for this PR as it is working for us. Can discuss more at the next SIG autoscaling. |
@rrangith: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rrangith The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Ack, apologies for premature closing then! I can't join SIG meeting on Monday, will be following the discussion on the PR & issue though. |
I would really want to avoid adding a separate config option for this. It doesn't feel like something that should be configurable, the flag would essentially control a very low-level implementation detail. I'm less worried that this PR changes behavior for existing clusters. I'm mostly worried that future changes to the balancing and adjacent logic will break your use case at some point.
Ah, so it's just the Expander part that's behind an RPC? That makes sense, thanks for the clarification. But yeah, I'm mostly advocating for keeping component responsibilities tightly-scoped. I know it's more work, but we could extend the
Similar node groups definitely can play a role in choosing the best option, that's why they're passed to Expander at all. But I don't see a good case for changing the similar node groups while trying to choose the best one. Extending Expander semantics is certainly an option we could go with, but then:
Happy to discuss further during the meeting! |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Related to #6940
This recomputation used to only occur when the bestOption NodeGroup did not exist, but was changed in #5802. There are cases where an expander could modify the bestOption's similar nodegroups, such as custom logic in the gRPC expander.
In cases like this, we should have a CLI option to trust expander’s similar nodegroups and skip the recomputation.
If a user does not enable this option, then by default the behaviour will stay the same. This will only skip similar nodegroup recomputation for users who enable this option.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: