-
Notifications
You must be signed in to change notification settings - Fork 4.2k
AEP-8459: MemoryPerCPU Enforce #8459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi @Jrmy2402. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Jrmy2402 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d7748f1
to
4ca0ab5
Compare
4ca0ab5
to
61e4f32
Compare
/ok-to-test |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! some notes from me
controlledResources: ["cpu", "memory"] | ||
controlledValues: RequestsAndLimits | ||
memoryPerCPU: "4Gi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in this case, VPA should skip the memory suggestion, right? We can also force this (for example, make sure memoryPerCPU
and controlledResources.memory
are mutually exclusive) . think just skipping is better. Just an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the memory suggestion was greater than 4Gi per CPU? Surely it needs to be increased then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. You specify that for every cpu you will get 4Gi. the all idea is to have a "hard-coded" cpu-memory ratio. As I wrote in the comment below those use cases should be clear to the users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is that the ratio is strictly enforced in both directions.
Concretely: if the memory recommendation is higher than cpu * memoryPerCPU, then CPU will be increased accordingly. Likewise, if CPU is higher than memory / memoryPerCPU, then memory is increased.
I’ll update the AEP to make this explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, could you also update the use cases as Omer had suggested, since it's not clear to me why someone may want this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I just pushed a commit updating the Motivation section to clarify the use cases, as suggested.
In our specific case, we want to use VPA to automatically scale our customers' services vertically, but since we charge them based on CPU with a guaranteed CPU-to-memory ratio, we need VPA to respect this fixed ratio in its recommendations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to express ratio as a resource.Quantity
and not just a plain integer or float, if you're considering partials?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A different question, taking a step back, is there a possibility where users will be interested in providing a ration for different pair of resources? The answer will allow us to better match the name for this variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial reasoning was consistency with other VPA fields that already use resource.Quantity
. But you’re right that semantically this field represents a ratio, not a resource amount. I’m fine switching to a float or plain integer if that’s the preferred approach.
I’m not 100% sure I understood the second question. What do you mean by "users will be interested in providing a ratio for different pair of resources"?
* If both CPU and memory are controlled, VPA enforces the ratio. | ||
* Applies to Target, LowerBound, UpperBound, and UncappedTarget. | ||
* If ratio cannot be applied (e.g., missing CPU), fallback to standard recommendations. | ||
* With feature gate OFF: recommendations are unaffected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please improve this? it's very unclear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I just pushed an update to clarify the wording and make it more explicit.
**When enabled**: | ||
* VPA honors `memoryPerCPU` in recommendations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And ignore memory recommendation ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the memory suggestion is greater than 4Gi per CPU, the CPU will be increased to respect the memoryPerCPU ratio
|
||
### Test Plan | ||
|
||
* Unit tests ensuring ratio enforcement logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need e2e tests. unit tests for AEP is not enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a commit updating the Test Plan section to explicitly mention adding e2e tests in addition to unit tests.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd require tests ensuring that when the feature gate is on or off the values and validation is applied accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks! I’ve updated the Test Plan so that unit tests explicitly cover both enforcement logic and the feature gate being on/off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM with minor comments
### Behavior | ||
|
||
* If both CPU and memory are controlled, VPA enforces the ratio. | ||
* Applies to Target, LowerBound, UpperBound, and UncappedTarget. | ||
* Ratio enforcement is strict: | ||
* If the memory recommendation would exceed `cpu * memoryPerCPU`, then **CPU is increased** to satisfy the ratio. | ||
* If the CPU recommendation would exceed `memory / memoryPerCPU`, then **memory is increased** to satisfy the ratio. | ||
* If ratio cannot be applied (e.g., missing CPU), fallback to standard recommendations. | ||
* With the `MemoryPerCPURatio` feature gate disabled, the `memoryPerCPU` field is ignored and recommendations fall back to standard VPA behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some examples here to help users better understand how the algorithm behaves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I’ve added examples to clarify the behavior in commit 5fed004
* Components depending on the feature gate: | ||
* recommender |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrianmoisey , should we also add this to the admission controller? ( we talked about this stuff on Slack ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the other feature gates, my vote is a yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jrmy2402 , please update this too. you can see an example of it here: https://github.com/kubernetes/autoscaler/pull/8012/files#diff-ad66c76a76541b7991631925641b989fc402901440d8e0dcbc3591009eef52b9R226
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the admission controller to the list, and I will update my MR #8420 this week after the AEP is merged
/retitle AEP-8459: MemoryPerCPU Enforce |
memory_bytes = cpu_cores * memoryPerCPU | ||
``` | ||
|
||
## Design Details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, thanks.
I’ve updated the AEP to mention that memoryPerCPU-based recommendations are also capped by --container-recommendation-max-allowed-cpu and --container-recommendation-max-allowed-memory, similar to the CPU Startup Boost case.
See commit: 5fed004
c5bbed0
to
f4b20b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good we just need to address this https://github.com/kubernetes/autoscaler/pull/8459/files#r2295957092
* Example 4: Feature gate disabled | ||
* Baseline recommendation: 1 CPU, 6Gi memory | ||
* UncappedTarget: 1 CPU, 6Gi (ratio not applied) | ||
* Target: 1 CPU, 6Gi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unneeded ( it's pretty clear that when the feature flag is off VPA works as usual ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this example, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example is still there, do you still need to push a change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I forgot to push my last commit...I’ve pushed it now.
/label api-review |
controlledResources: ["cpu", "memory"] | ||
controlledValues: RequestsAndLimits | ||
memoryPerCPU: "4Gi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to express ratio as a resource.Quantity
and not just a plain integer or float, if you're considering partials?
|
||
### Behavior | ||
|
||
* If both CPU and memory are controlled, VPA enforces the ratio. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if both (cpu and memory) are not specified? Should that be a validation error? It seems, like we should enforce that if you specify both you should get an error, this way we'll ensure that either you specify all the pieces of the puzzle, or none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, my thinking was to simply ignore memoryPerCPU if either CPU or memory was not specified in controlledResources.
But if the philosophy is rather to fail fast and return a validation error whenever memoryPerCPU is set without both CPU and memory being present, I’m fine with that approach too, I can update the AEP accordingly.
* Applies to Target, LowerBound, UpperBound, and UncappedTarget. | ||
* Ratio enforcement is strict: | ||
* If the memory recommendation would exceed `cpu * memoryPerCPU`, then **CPU is increased** to satisfy the ratio. | ||
* If the CPU recommendation would exceed `memory / memoryPerCPU`, then **memory is increased** to satisfy the ratio. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to say we should error out if the math doesn't stand with the cpu and memory values, adjusting seems "magical", and I'd advice against it. Explicitness is always better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, implicit adjustments can indeed feel “magical.”
In this case, though, the whole purpose of the feature is to enforce the ratio automatically: if CPU or memory drifts away from the configured ratio, VPA brings them back in line.
If we only validated and errored, users wouldn’t get the behavior they’re asking for (“always keep memory = cpu × memoryPerCPU”), they’d just see failures.
That would make the feature much less useful in practice.
Or maybe I didn’t fully understand your point?
|
||
### Test Plan | ||
|
||
* Unit tests ensuring ratio enforcement logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd require tests ensuring that when the feature gate is on or off the values and validation is applied accordingly.
controlledResources: ["cpu", "memory"] | ||
controlledValues: RequestsAndLimits | ||
memoryPerCPU: "4Gi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A different question, taking a step back, is there a possibility where users will be interested in providing a ration for different pair of resources? The answer will allow us to better match the name for this variable.
What type of PR is this?
/kind documentation
/kind feature
/kind api-change
What this PR does / why we need it:
AEP for #8420