-
Notifications
You must be signed in to change notification settings - Fork 388
docs: RFC for consolidation price improvement factor #2562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: RFC for consolidation price improvement factor #2562
Conversation
Signed-off-by: jukie <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jukie The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @jukie. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Pull Request Test Coverage Report for Build 18253400908Details
💛 - Coveralls |
|
/retest |
|
@jukie: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| The price improvement factor will be configurable at **two levels** with the following precedence: | ||
|
|
||
| 1. **NodePool-level** (highest priority) - Per-workload control | ||
| 2. **Operator-level** (fallback) - Cluster-wide default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally avoid global configuration of Karpenter and recommend things like Kyverno to control this type of behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay no problem. By this do you mean drop the operator-level config and keep it NodePool only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep.
| // 1.0 = Consolidate for any cost savings (legacy behavior) | ||
| // 0.8 = Require 20% cost savings | ||
| // 0.5 = Require 50% cost savings (very conservative) | ||
| // 0.0 = Disable price-based consolidation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sugg: reorient this around percentage, where it's an integer from [0, 100]. The kubernetes ecosystem avoids floats in APIs due to precision serialization challenges.
|
|
||
| 2. **Calculate threshold**: `maxAllowedPrice = currentPrice × priceImprovementFactor` | ||
|
|
||
| 3. **Filter instances**: Only consider instances where `launchPrice < maxAllowedPrice` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you're factoring multi node consolidation into this. There are a few variants:
1 -> 0 (remove)
1 -> 1 (replace)
n -> 1 (multi node)
We don't currently do this today
1 -> 2 (split)
But if we ever did, we'd need to make sure we were handling price improvement appropriately. I can't think of any reason that this would be particularly difficult, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One complication is we need to define how multi-node works with nodes from different nodepools with different price thresholds configured.
2 -> 1 Multi-Node
Node A = from nodepool with 10% threshold
Node B = from nodepool with 5% threshold
Replacement Node is 4% cheaper than A+B, should clearly not replace
Replacement Node is 6% cheaper than A+B, ??
Replacement Node is 10% cheaper than A+B, should clearly replace
I would guess it should take the most conservative value and not replace for the 6% cheaper case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I would agree with the approach of taking the most conservative. Said another way, each Node's improvement threshold must be met in order for the replacement to proceed.
|
|
||
| ## Backward Compatibility | ||
|
|
||
| - **No breaking changes**: Default value of `1.0` maintains existing behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious if we should explore a better default (e.g. .05%). I'd love to see some analysis on consolidations across a cluster over time, with a histogram of price improvement for each change.
There is a reasonable argument that Karpenter's consolidation algorithm is a heuristic and that we could launch with a low value for this setting without it being a breaking change. We make implementation changes to consolidation all the time and don't guarantee any deterministic decision making around cost/disruption tradeoffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to apply a different default but initially assumed the maintainers would prefer matching existing behavior upon initial release. Starting with .05% seems like it'd be a safe initial value and after user feedback it could be tuned higher.
|
@jukie, I like the thinking here, but I want to make sure we have exhausted "zero config" options. One question I always ask myself when adding new API surface is "how would a customer decide how much to configure this".
|
|
Thanks @ellistarn. For workloads that don't have any negative experience associated with node churn and the resulting pod interruptions (long startup times, cache warmups, etc) users might prefer the current state of consolidating upon any form of savings but for workloads that do I think it is a tradeoff between cost savings and interruptions so I think it's justified to expose it as a configuration option. |
How will you decide to set this for your organization? |
|
An org would have the option of running multiple node pools each with different values based on the workloads that target them. However that has the end result and tradeoff of probably running more nodes than actually necessary so the next consideration might be to run a single (or some fewer number of NodePools) instead to still get the savings and adjust the improvement factor accordingly. The decision would be a balance between cost savings and causing interruptions to workloads. Things like pod-deletion-cost or do-not-disrupt annotations could help here too. |
|
So you would expect organizations to run multiple load tests with different values and measure their overall costs in each version? |
|
If an org wants it perfectly tailored to their needs that might be a reasonable approach. However I suspect users who are seeking to tweak this are coming from the other side of the spectrum where the stronger desire is to reduce node churn and workload interruption. Node cost is a factor and should still be minimized which Karpenter solves for but node churn and workload interruption incurs another form of cost. An org could directly measure all 3 parameters and use whatever works best for them. I don't think there's a single value that's guaranteed to work for all use cases (even within the same org) so similar to how ClusterAutoscaler handles it, I think it should be exposed. |
|
I'm open minded. I'm going to see if we can gather some data on this and see if it can help inform our decisions. Will get back soon. |
Description
This is a proposal to introduce a cost savings threshold when making consolidation decisions in order to reduce churn.
Related issue: aws/karpenter-provider-aws#7146
How was this change tested?
RFC only but draft PR opened at #2561
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.