docs: RFC for consolidation price improvement factor #2562

jukie · 2025-10-05T03:35:31Z

Description
This is a proposal to introduce a cost savings threshold when making consolidation decisions in order to reduce churn.

Related issue: aws/karpenter-provider-aws#7146

How was this change tested?
RFC only but draft PR opened at #2561

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Signed-off-by: jukie <[email protected]>

k8s-ci-robot · 2025-10-05T03:35:38Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jukie
Once this PR has been reviewed and has the lgtm label, please assign gjtempleton for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-10-05T03:35:42Z

Hi @jukie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coveralls · 2025-10-05T03:49:43Z

Pull Request Test Coverage Report for Build 18253400908

Details

0 of 0 changed or added relevant lines in 0 files are covered.
4 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.07%) to 81.678%

Files with Coverage Reduction	New Missed Lines	%
pkg/controllers/provisioning/scheduling/topologydomaingroup.go	4	86.21%

Totals
Change from base Build 18201406291:	0.07%
Covered Lines:	11577
Relevant Lines:	14174

💛 - Coveralls

jukie · 2025-10-05T03:59:18Z

/retest

k8s-ci-robot · 2025-10-05T03:59:33Z

@jukie: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ellistarn · 2025-10-06T16:10:34Z

designs/consolidation-price-improvement-factor.md

+The price improvement factor will be configurable at **two levels** with the following precedence:
+
+1. **NodePool-level** (highest priority) - Per-workload control
+2. **Operator-level** (fallback) - Cluster-wide default


We generally avoid global configuration of Karpenter and recommend things like Kyverno to control this type of behavior.

Okay no problem. By this do you mean drop the operator-level config and keep it NodePool only?

ellistarn · 2025-10-06T16:12:05Z

designs/consolidation-price-improvement-factor.md

+    //   1.0 = Consolidate for any cost savings (legacy behavior)
+    //   0.8 = Require 20% cost savings
+    //   0.5 = Require 50% cost savings (very conservative)
+    //   0.0 = Disable price-based consolidation


Sugg: reorient this around percentage, where it's an integer from [0, 100]. The kubernetes ecosystem avoids floats in APIs due to precision serialization challenges.

ellistarn · 2025-10-06T16:16:01Z

designs/consolidation-price-improvement-factor.md

+
+2. **Calculate threshold**: `maxAllowedPrice = currentPrice × priceImprovementFactor`
+
+3. **Filter instances**: Only consider instances where `launchPrice < maxAllowedPrice`


Make sure you're factoring multi node consolidation into this. There are a few variants:

1 -> 0 (remove)
1 -> 1 (replace)
n -> 1 (multi node)

We don't currently do this today

1 -> 2 (split)

But if we ever did, we'd need to make sure we were handling price improvement appropriately. I can't think of any reason that this would be particularly difficult, though.

One complication is we need to define how multi-node works with nodes from different nodepools with different price thresholds configured.

2 -> 1 Multi-Node
Node A = from nodepool with 10% threshold
Node B = from nodepool with 5% threshold

Replacement Node is 4% cheaper than A+B, should clearly not replace
Replacement Node is 6% cheaper than A+B, ??
Replacement Node is 10% cheaper than A+B, should clearly replace

I would guess it should take the most conservative value and not replace for the 6% cheaper case.

Yep I would agree with the approach of taking the most conservative. Said another way, each Node's improvement threshold must be met in order for the replacement to proceed.

ellistarn · 2025-10-06T16:19:25Z

designs/consolidation-price-improvement-factor.md

+
+## Backward Compatibility
+
+- **No breaking changes**: Default value of `1.0` maintains existing behavior


I am curious if we should explore a better default (e.g. .05%). I'd love to see some analysis on consolidations across a cluster over time, with a histogram of price improvement for each change.

There is a reasonable argument that Karpenter's consolidation algorithm is a heuristic and that we could launch with a low value for this setting without it being a breaking change. We make implementation changes to consolidation all the time and don't guarantee any deterministic decision making around cost/disruption tradeoffs.

I would love to apply a different default but initially assumed the maintainers would prefer matching existing behavior upon initial release. Starting with .05% seems like it'd be a safe initial value and after user feedback it could be tuned higher.

ellistarn · 2025-10-08T17:28:22Z

@jukie, I like the thinking here, but I want to make sure we have exhausted "zero config" options. One question I always ask myself when adding new API surface is "how would a customer decide how much to configure this".

If the answer is "vibes", then we probably shouldn't expose it and should just take a heuristic opinion.
If the answer is "do XYZ calculation", we should explore automating that calculation
If the answer is "there is a fundamental tradeoff that only the customer can know, and this is how they know", then we add the config.

jukie · 2025-10-08T17:35:14Z

Thanks @ellistarn. For workloads that don't have any negative experience associated with node churn and the resulting pod interruptions (long startup times, cache warmups, etc) users might prefer the current state of consolidating upon any form of savings but for workloads that do I think it is a tradeoff between cost savings and interruptions so I think it's justified to expose it as a configuration option.

ellistarn · 2025-10-08T20:54:33Z

I think it is a tradeoff between cost savings and interruptions so I think it's justified to expose it as a configuration option.

How will you decide to set this for your organization?

jukie · 2025-10-08T22:08:41Z

An org would have the option of running multiple node pools each with different values based on the workloads that target them. However that has the end result and tradeoff of probably running more nodes than actually necessary so the next consideration might be to run a single (or some fewer number of NodePools) instead to still get the savings and adjust the improvement factor accordingly.

The decision would be a balance between cost savings and causing interruptions to workloads. Things like pod-deletion-cost or do-not-disrupt annotations could help here too.

ellistarn · 2025-10-08T22:18:43Z

So you would expect organizations to run multiple load tests with different values and measure their overall costs in each version?

jukie · 2025-10-09T03:12:37Z

If an org wants it perfectly tailored to their needs that might be a reasonable approach. However I suspect users who are seeking to tweak this are coming from the other side of the spectrum where the stronger desire is to reduce node churn and workload interruption. Node cost is a factor and should still be minimized which Karpenter solves for but node churn and workload interruption incurs another form of cost.

An org could directly measure all 3 parameters and use whatever works best for them.

I don't think there's a single value that's guaranteed to work for all use cases (even within the same org) so similar to how ClusterAutoscaler handles it, I think it should be exposed.

ellistarn · 2025-10-09T17:02:49Z

I'm open minded. I'm going to see if we can gather some data on this and see if it can help inform our decisions. Will get back soon.

price improvement factor RFC

30b6a7f

Signed-off-by: jukie <[email protected]>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 5, 2025

k8s-ci-robot requested review from engedaam and tallaxes October 5, 2025 03:35

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 5, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 5, 2025

Merge branch 'main' into consolidation-price-improvement-factor-rfc

f4d9d00

jukie changed the title ~~RFC: consolidation price improvement factor~~ docs: RFC for consolidation price improvement factor Oct 5, 2025

ellistarn reviewed Oct 6, 2025

View reviewed changes


		2. Calculate threshold: `maxAllowedPrice = currentPrice × priceImprovementFactor`

		3. Filter instances: Only consider instances where `launchPrice < maxAllowedPrice`


		## Backward Compatibility

		- No breaking changes: Default value of `1.0` maintains existing behavior

docs: RFC for consolidation price improvement factor #2562

Are you sure you want to change the base?

docs: RFC for consolidation price improvement factor #2562

Uh oh!

Conversation

jukie commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Oct 5, 2025

Uh oh!

k8s-ci-robot commented Oct 5, 2025

Uh oh!

coveralls commented Oct 5, 2025

Pull Request Test Coverage Report for Build 18253400908

Details

💛 - Coveralls

Uh oh!

jukie commented Oct 5, 2025

Uh oh!

k8s-ci-robot commented Oct 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jukie Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ellistarn commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jukie commented Oct 8, 2025

Uh oh!

ellistarn commented Oct 8, 2025

Uh oh!

jukie commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellistarn commented Oct 8, 2025

Uh oh!

jukie commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellistarn commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jukie commented Oct 5, 2025 •

edited

Loading

jukie Oct 6, 2025 •

edited

Loading

ellistarn commented Oct 8, 2025 •

edited

Loading

jukie commented Oct 8, 2025 •

edited

Loading

jukie commented Oct 9, 2025 •

edited

Loading