Add option to skip similar nodegroup recomputation #6926

rrangith · 2024-06-14T18:00:08Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Related to #6940

This recomputation used to only occur when the bestOption NodeGroup did not exist, but was changed in #5802. There are cases where an expander could modify the bestOption's similar nodegroups, such as custom logic in the gRPC expander.

In cases like this, we should have a CLI option to trust expander’s similar nodegroups and skip the recomputation.

If a user does not enable this option, then by default the behaviour will stay the same. This will only skip similar nodegroup recomputation for users who enable this option.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added `skip-similar-node-group-recomputation` flag. If enabled, skips similar NodeGroup recomputation for the best option returned by the expander during scaleup.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2024-06-14T18:00:17Z

Hi @rrangith. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-triage-robot · 2024-10-10T20:21:21Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rrangith · 2024-11-06T21:18:37Z

/remove-lifecycle stale

elmiko

there had been some discussions about trying to limit the new flags we are adding to the autoscaler, but i'm not sure if there was ever any guidance about that. regardless, i think this new flag should also be mentioned in the FAQ, see https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca

elmiko · 2024-12-02T16:26:42Z

just a question as i'm reviewing, is there any relationship or interaction between this flag and the balance-similar-node-groups flag? (eg does the latter need to be enabled or anything special like that)

rrangith · 2024-12-02T18:12:50Z

just a question as i'm reviewing, is there any relationship or interaction between this flag and the balance-similar-node-groups flag? (eg does the latter need to be enabled or anything special like that)

There is no strict relationship between the 2 flags for things to function properly, however if you enable skip-similar-node-group-recomputation but have balance-similar-nodegroups disabled, then it will just do nothing https://github.com/rrangith/autoscaler/blob/26277519bfe0728a532076f22e4485f1e7ba4adb/cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go#L722-L730

I was debating mentioning that in the FAQ or cli arg description, but wasn't sure if I should add even more words to it

elmiko · 2024-12-02T19:56:30Z

I was debating mentioning that in the FAQ or cli arg description, but wasn't sure if I should add even more words to it

i think it's worth mentioning, if only so people know they need to have balance-similar-nodegroups enabled to get this behavior.

elmiko

/lgtm

we will need a review from a core maintainer for the flag change.

jackfrancis · 2025-01-13T22:51:56Z

/test pull-cluster-autoscaler-e2e-azure-master

jackfrancis · 2025-01-27T16:36:09Z

cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go

 	newNodes int,
 	nodeInfos map[string]*framework.NodeInfo,
 	schedulablePodGroups map[string][]estimator.PodEquivalenceGroup,
 ) ([]nodegroupset.ScaleUpInfo, errors.AutoscalerError) {
-	// Recompute similar node groups in case they need to be updated
-	similarNodeGroups := o.ComputeSimilarNodeGroups(nodeGroup, nodeInfos, schedulablePodGroups, now)
+	similarNodeGroups := bestOption.SimilarNodeGroups


It does seem like we're reliably populating bestOption.SimilarNodeGroups as part of the flow before invoking this balanceScaleUps method (L148 where we invoke o.ComputeExpansionOption).

That said, do we want to do some checking here, to make sure we have a good default value if "skip similar nodegroup recomputation" is enabled?

I dont think there is a good default value since my thinking with this feature was to fully rely on what the expander deems as the similar nodegroups here. So for example if the bestOption is nodegroup A, and it has similar nodegroups B and C, but the expander removes both B and C as similar nodegroups, then CA should respect that and only scaleup nodegroup A

makes sense, thx

jackfrancis · 2025-01-27T17:07:09Z

/lgtm

/assign @x13n @towca

towca · 2025-01-29T14:40:51Z

I'm afraid I'm a bit lost here. Why should Expander be the source of truth for computing similar node groups, don't we have NodeGroupSetProcessor for that?

Expander modifying the options it gets is IMO very surprising and could lead to subtle bugs down the road. Expander's responsibility is to pick between pre-computed options.

I also don't get the reasoning for moving the recalculation in #5802, the only cases I can think of where we'd want to recalculate is if a node group needs to be created. @BigDarkClown Do you remember the context for this one?

rrangith · 2025-01-29T16:18:50Z

I'm afraid I'm a bit lost here. Why should Expander be the source of truth for computing similar node groups, don't we have NodeGroupSetProcessor for that?

Expander modifying the options it gets is IMO very surprising and could lead to subtle bugs down the road. Expander's responsibility is to pick between pre-computed options.

@towca my thinking was that this option could be used to give more power to the expander, beyond simply picking between pre-computed options.

With the gRPC expander for example, you can have any custom logic you want. So if for example I want to filter out any nodegroups that have the label "foo", I could have logic in my gRPC expander to filter that out as a best option.

The problem occurs when balancing happens. Let's say I have nodegroups A and B, nodegroup A has the label "foo", and nodegroup B does not have this label. Let's say they are both similar nodegroups, so scaleups would be balanced across the 2 nodegroups.

Currently (without changes from my PR), Expander can omit nodegroup A and return nodegroup B as the best option (with nodegroup A as a similar nodegroup). Then scaleups will be balanced and nodegroup A will get scaled up even though I didn't want scaleups on any nodegroup with label "foo".

However if we give the power to Expander to not only control the best option, but also the best option's similar nodegroups, then we can completely block scaleups on all nodegroups with label "foo". That is what my PR will allow. And it is behind a flag so that it is completely opt-in, by default nothing will change.

I also don't get the reasoning for moving the recalculation in #5802, the only cases I can think of where we'd want to recalculate is if a node group needs to be created. @BigDarkClown Do you remember the context for this one?

Yea if #5802 was never merged, then we wouldn't need this PR. The recalculation is what is preventing the expander from having full power over the similar nodegroups, but I don't think this was the intent of the original PR

towca · 2025-01-31T18:06:25Z

With the gRPC expander for example, you can have any custom logic you want. So if for example I want to filter out any nodegroups that have the label "foo", I could have logic in my gRPC expander to filter that out as a best option.

You certainly can have any custom logic behind the RPC, but if that logic breaks some assumptions that CA makes about Expanders, it'll lead to bugs. Balancing is one example, but I'm not convinced that there aren't more, or that we don't introduce more down the road. Especially because all of the custom logic is out-of-tree, so it's hard to validate if it breaks while doing changes in-tree.

The problem occurs when balancing happens. Let's say I have nodegroups A and B, nodegroup A has the label "foo", and nodegroup B does not have this label. Let's say they are both similar nodegroups, so scaleups would be balanced across the 2 nodegroups.

Currently (without changes from my PR), Expander can omit nodegroup A and return nodegroup B as the best option (with nodegroup A as a similar nodegroup). Then scaleups will be balanced and nodegroup A will get scaled up even though I didn't want scaleups on any nodegroup with label "foo".

However if we give the power to Expander to not only control the best option, but also the best option's similar nodegroups, then we can completely block scaleups on all nodegroups with label "foo". That is what my PR will allow. And it is behind a flag so that it is completely opt-in, by default nothing will change.

I get the motiviation, but why can't the gRPC provider just implement its own NodeGroupSetProcessor (or customize the current implementation), and let the user hook into its FindSimilarNodeGroups()? Wouldn't this achieve the same thing without changing component responsibilities?

Yea if #5802 was never merged, then we wouldn't need this PR. The recalculation is what is preventing the expander from having full power over the similar nodegroups, but I don't think this was the intent of the original PR

I synced with @BigDarkClown offline, and it was actually the intent of that PR. We had a bug in our GKE-specific Expander that resulted in clearing the SimilarNodeGroups field. Since Expanders aren't supposed to do that, the recalculation was added as a bit of defensive programming to guard against similar bugs in Expander implementations (in addition to fixing the Expander implementation itself of course).

rrangith · 2025-02-03T15:46:50Z

With the gRPC expander for example, you can have any custom logic you want. So if for example I want to filter out any nodegroups that have the label "foo", I could have logic in my gRPC expander to filter that out as a best option.

You certainly can have any custom logic behind the RPC, but if that logic breaks some assumptions that CA makes about Expanders, it'll lead to bugs. Balancing is one example, but I'm not convinced that there aren't more, or that we don't introduce more down the road. Especially because all of the custom logic is out-of-tree, so it's hard to validate if it breaks while doing changes in-tree.

I am curious about other types of bugs that could eventually occur from this? Since the similar nodegroups is exclusively used for balancing. And by default balancing is not enabled and similar nodegroups is not used at all, see here. A user would have to opt-in to have similar nodegroups be considered at all. And then they'd have to opt-in again to skip similar nodegroup recomputation, so in my point of view the blast radius seems very small

I get the motiviation, but why can't the gRPC provider just implement its own NodeGroupSetProcessor (or customize the current implementation), and let the user hook into its FindSimilarNodeGroups()? Wouldn't this achieve the same thing without changing component responsibilities?

We aren't using the gRPC provider, we use the regular cloud providers. For example the gce or aws providers. But is your general idea to have an additional processor that a user could opt-in to and then have that make a gRPC call to get the similar nodegroups for a given nodegroup? This makes sense in theory, but leads to a lot of extra network requests when in reality all we need to know is the best option's similar nodegroups. If i have 1000 nodegroups in my cluster for example, I'd now need to make 1000 gRPC calls (1 per FindSimilarNodegroups for each nodegroup) for each scaleup request.

Yea if #5802 was never merged, then we wouldn't need this PR. The recalculation is what is preventing the expander from having full power over the similar nodegroups, but I don't think this was the intent of the original PR

I synced with @BigDarkClown offline, and it was actually the intent of that PR. We had a bug in our GKE-specific Expander that resulted in clearing the SimilarNodeGroups field. Since Expanders aren't supposed to do that, the recalculation was added as a bit of defensive programming to guard against similar bugs in Expander implementations (in addition to fixing the Expander implementation itself of course).

Yea so my argument to that would be that #5802 was a breaking change and I see value in the original behaviour. I think CA should allow users who know what they are doing to remove the defensive programming and go back to the original CA behaviour pre #5802 and allow the expander to have control over the similar nodegroups. Since ultimately this was a bug in the expander implementation and not CA.

Right now the purpose of the expander is to choose the best option, but my proposal is to allow the expander to have a say on what the best option's similar nodegroups are too. Since the similar nodegroups could play a part in choosing the best option. And the best option might not actually be the best due to its similar nodegoups, so may need some extra filtering to remove certain similar nodegroups

Happy to discuss more over a call or at the next SIG Autoscaling meeting. We are using this change alongside #6941 in our fork and it has been working well for us and would like others to benefit from it too

x13n · 2025-02-07T14:02:55Z

The discussion in #6940 seems to be going in a quite different direction, so closing this PR for now. Feel free to reopen if you want to revisit this approach.

/close

k8s-ci-robot · 2025-02-07T14:03:01Z

@x13n: Closed this PR.

In response to this:

The discussion in #6940 seems to be going in a quite different direction, so closing this PR for now. Feel free to reopen if you want to revisit this approach.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

rrangith · 2025-02-07T14:10:30Z

/reopen

@x13n as I said in #6940 (comment)

We are currently using #6926 and #6941 in our fork and it is working perfectly fine, but I'd be interested in seeing the alternative way

I am still advocating for this PR as it is working for us. Can discuss more at the next SIG autoscaling.

k8s-ci-robot · 2025-02-07T14:10:36Z

@rrangith: Reopened this PR.

In response to this:

/reopen

@x13n as I said in #6940 (comment)

We are currently using #6926 and #6941 in our fork and it is working perfectly fine, but I'd be interested in seeing the alternative way

I am still advocating for this PR as it is working for us. Can discuss more at the next SIG autoscaling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-02-07T14:10:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rrangith
Once this PR has been reviewed and has the lgtm label, please ask for approval from towca. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

cluster-autoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

x13n · 2025-02-07T14:17:05Z

Ack, apologies for premature closing then! I can't join SIG meeting on Monday, will be following the discussion on the PR & issue though.

towca · 2025-02-10T15:00:04Z

You certainly can have any custom logic behind the RPC, but if that logic breaks some assumptions that CA makes about Expanders, it'll lead to bugs. Balancing is one example, but I'm not convinced that there aren't more, or that we don't introduce more down the road. Especially because all of the custom logic is out-of-tree, so it's hard to validate if it breaks while doing changes in-tree.

I am curious about other types of bugs that could eventually occur from this? Since the similar nodegroups is exclusively used for balancing. And by default balancing is not enabled and similar nodegroups is not used at all, see here. A user would have to opt-in to have similar nodegroups be considered at all. And then they'd have to opt-in again to skip similar nodegroup recomputation, so in my point of view the blast radius seems very small

I would really want to avoid adding a separate config option for this. It doesn't feel like something that should be configurable, the flag would essentially control a very low-level implementation detail.

I'm less worried that this PR changes behavior for existing clusters. I'm mostly worried that future changes to the balancing and adjacent logic will break your use case at some point.

I get the motiviation, but why can't the gRPC provider just implement its own NodeGroupSetProcessor (or customize the current implementation), and let the user hook into its FindSimilarNodeGroups()? Wouldn't this achieve the same thing without changing component responsibilities?

We aren't using the gRPC provider, we use the regular cloud providers. For example the gce or aws providers. But is your general idea to have an additional processor that a user could opt-in to and then have that make a gRPC call to get the similar nodegroups for a given nodegroup? This makes sense in theory, but leads to a lot of extra network requests when in reality all we need to know is the best option's similar nodegroups. If i have 1000 nodegroups in my cluster for example, I'd now need to make 1000 gRPC calls (1 per FindSimilarNodegroups for each nodegroup) for each scaleup request.

Ah, so it's just the Expander part that's behind an RPC? That makes sense, thanks for the clarification. But yeah, I'm mostly advocating for keeping component responsibilities tightly-scoped.

I know it's more work, but we could extend the NodeGroupSetProcessor to have a separate call for the best option. Or maybe we could implement caching on the Expander side?

Yea if #5802 was never merged, then we wouldn't need this PR. The recalculation is what is preventing the expander from having full power over the similar nodegroups, but I don't think this was the intent of the original PR

I synced with @BigDarkClown offline, and it was actually the intent of that PR. We had a bug in our GKE-specific Expander that resulted in clearing the SimilarNodeGroups field. Since Expanders aren't supposed to do that, the recalculation was added as a bit of defensive programming to guard against similar bugs in Expander implementations (in addition to fixing the Expander implementation itself of course).

Yea so my argument to that would be that #5802 was a breaking change and I see value in the original behaviour. I think CA should allow users who know what they are doing to remove the defensive programming and go back to the original CA behaviour pre #5802 and allow the expander to have control over the similar nodegroups. Since ultimately this was a bug in the expander implementation and not CA.

Right now the purpose of the expander is to choose the best option, but my proposal is to allow the expander to have a say on what the best option's similar nodegroups are too. Since the similar nodegroups could play a part in choosing the best option. And the best option might not actually be the best due to its similar nodegoups, so may need some extra filtering to remove certain similar nodegroups

Happy to discuss more over a call or at the next SIG Autoscaling meeting. We are using this change alongside #6941 in our fork and it has been working well for us and would like others to benefit from it too

Similar node groups definitely can play a role in choosing the best option, that's why they're passed to Expander at all. But I don't see a good case for changing the similar node groups while trying to choose the best one.

Extending Expander semantics is certainly an option we could go with, but then:

We'd have to document them clearly.
How would we want the semantics to work? We can't just say "Expander can modify the passed Options arbitraily", because the scale-up execution logic depends on their contents.
If we say "Expander can modify only the similar node groups field in the passed Options", then we're back to "we already have a component that's supposed to do that".

Happy to discuss further during the meeting!

k8s-ci-robot · 2025-03-10T21:40:12Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-triage-robot · 2025-06-08T21:56:13Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 14, 2024

k8s-ci-robot requested review from feiskyer and vadasambar June 14, 2024 18:00

k8s-ci-robot added area/cluster-autoscaler size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 14, 2024

BenHinthorne mentioned this pull request Jun 18, 2024

include similar nodegroups in gRPC expander options #6941

Closed

rrangith mentioned this pull request Jun 18, 2024

[feature] Maximum Nodes Per Zone #6940

Open

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 10, 2024

rrangith force-pushed the skip-similar-nodegroup-recomputation branch from 32774c7 to 450a753 Compare November 6, 2024 21:11

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2024

rrangith force-pushed the skip-similar-nodegroup-recomputation branch from 450a753 to 35c514d Compare November 6, 2024 21:14

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 6, 2024

elmiko reviewed Dec 2, 2024

View reviewed changes

rrangith force-pushed the skip-similar-nodegroup-recomputation branch from 35c514d to 2627751 Compare December 2, 2024 15:51

rrangith requested a review from elmiko December 2, 2024 15:51

rrangith force-pushed the skip-similar-nodegroup-recomputation branch from 2627751 to 0a2c91f Compare December 2, 2024 22:58

elmiko reviewed Dec 3, 2024

View reviewed changes

k8s-ci-robot assigned elmiko Dec 3, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 3, 2024

This was referenced Dec 13, 2024

Cherry pick expander changes DataDog/autoscaler#123

Merged

Add test for recomputing similar nodegroups #7612

Merged

rrangith force-pushed the skip-similar-nodegroup-recomputation branch from 0a2c91f to 3f9efa1 Compare January 9, 2025 16:36

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 9, 2025

Add test for skipSimilarNodeGroupRecomputation

f84b30e

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 9, 2025

jackfrancis reviewed Jan 27, 2025

View reviewed changes

k8s-ci-robot assigned towca, x13n and jackfrancis Jan 27, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 27, 2025

k8s-ci-robot closed this Feb 7, 2025

k8s-ci-robot reopened this Feb 7, 2025

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2025

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2025

Add option to skip similar nodegroup recomputation #6926

Are you sure you want to change the base?

Add option to skip similar nodegroup recomputation #6926

Uh oh!

Conversation

rrangith commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Jun 14, 2024

Uh oh!

k8s-triage-robot commented Oct 10, 2024

Uh oh!

rrangith commented Nov 6, 2024

Uh oh!

elmiko left a comment

Choose a reason for hiding this comment

Uh oh!

elmiko commented Dec 2, 2024

Uh oh!

rrangith commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elmiko commented Dec 2, 2024

Uh oh!

elmiko left a comment

Choose a reason for hiding this comment

Uh oh!

jackfrancis commented Jan 13, 2025

Uh oh!

jackfrancis Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

rrangith Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

jackfrancis Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

jackfrancis commented Jan 27, 2025

Uh oh!

towca commented Jan 29, 2025

Uh oh!

rrangith commented Jan 29, 2025

Uh oh!

towca commented Jan 31, 2025

Uh oh!

rrangith commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

x13n commented Feb 7, 2025

Uh oh!

k8s-ci-robot commented Feb 7, 2025

Uh oh!

rrangith commented Feb 7, 2025

Uh oh!

k8s-ci-robot commented Feb 7, 2025

Uh oh!

k8s-ci-robot commented Feb 7, 2025

Uh oh!

x13n commented Feb 7, 2025

Uh oh!

towca commented Feb 10, 2025

Uh oh!

k8s-ci-robot commented Mar 10, 2025

Uh oh!

k8s-triage-robot commented Jun 8, 2025

Uh oh!

Uh oh!

rrangith commented Jun 14, 2024 •

edited

Loading

rrangith commented Dec 2, 2024 •

edited

Loading

rrangith commented Feb 3, 2025 •

edited

Loading