docs: Node Overlay RFC #2166

engedaam · 2025-04-25T19:28:04Z

Fixes

Pricing Constructs
- Support for Saving plan: Account for AWS Savings Plan when choosing to deploy on-demand or spot aws/karpenter-provider-aws#3860
- Custom cost adjustment for licensing fee per host https://github.com/aws/karpenter-provider-aws/issues/5033
- Capacity type discount as a percentage: feat: Adding capacityType discount as a percentage aws/karpenter-provider-aws#4697
- Instance Type Price Adjustment for Carbon Efficiency: docs: Add Carbon Efficient design document aws/karpenter-provider-aws#4686
Extended Resources Constructs
- Custom resource requests: Mega Issue: Karpenter doesnt support custom resources requests/limit #751
- GPU time slicing: karpenter doesn't scaleout when using time-slicing for GPUs #729
- Resource Operators: Support Nvidia GPU Feature Discovery #1219
- HugePage support: How to add EFA to instances (or custom resources requests) aws/karpenter-provider-aws#6296

Description
This RFC is an updated Node Overlay RFC original opened a 6 month ago. This RFC plans to go over what is planned

Node Overlay RFC: RFC: NodeOverlay #1305
Instance Type Settings Proposal: Fixed release note generation failures. Removed for now. aws/karpenter-provider-aws#239

kind: NodeOverlay
metadata:
  name: default
spec:
  priceDiscount: "90%"
  capacity:
    smarter-devices/fuse: 1
---
kind: NodeOverlay
metadata:
  name: memory
spec:
  requirements:
  - key: karpenter.k8s.aws/instance-memory
    operator: Gt
    values: [2048]
   priceAdjustment: "0.109"
  capacity:
    smarter-devices/fuse: 23

How was this change tested?

N/A

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

k8s-ci-robot · 2025-04-25T19:28:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: engedaam
Once this PR has been reviewed and has the lgtm label, please assign mwielgus for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coveralls · 2025-04-25T19:40:58Z

Pull Request Test Coverage Report for Build 15567714722

Details

0 of 0 changed or added relevant lines in 0 files are covered.
3 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.02%) to 82.051%

Files with Coverage Reduction	New Missed Lines	%
pkg/controllers/provisioning/scheduling/nodeclaim.go	3	89.66%

Totals
Change from base Build 15452759565:	-0.02%
Covered Lines:	10263
Relevant Lines:	12508

💛 - Coveralls

ellistarn · 2025-04-27T20:10:48Z

designs/node-overlay.md

+  - key: karpenter.sh/capacity-type
+    operator: In
+    values: ["on-demand"]
+  priceAdjustment: 0.101 


I'd avoid floats in APIs. They're notorious for math issues: https://stackoverflow.com/questions/36043597/why-is-1-001-0-001-not-equal-to-1-002

For this reason, you'd be hard pressed to see them in any k8s API.

kubernetes-sigs/controller-tools#245

Agreed, ideally you would keep this in the integer numbers world. Maybe use something like basis point https://en.wikipedia.org/wiki/Basis_point

I'm using integer value to denote the notion of cents and that will be used as a signed value to reflect the price adjustment.

Instead of using an integer value, It would make sense to have the floats as a string, since the field will accept both percentage and float value. It would make sense to do the conversion within the controller

ellistarn · 2025-04-27T20:18:49Z

designs/node-overlay.md

@@ -0,0 +1,305 @@
+# Karpenter - Node Overlay


Excited to see this rev!

We skip over any discussion of why this can't be included in the Node Pool API. I'd be curious to explore those tradeoffs. I wrote the original RFC, but I've forgotten much of the why behind some of these implicit decisions.

Off the top of my head, I think there's a classic simplicity / flexibility tradeoff.

Including in the NodePool API could lead customers to create more NodePools to handle different override values for different shapes of nodes. There's nothing fundamentally wrong with this from an API perspective, though I know our scheduler isn't particularly good at simulating multiple node pools at the same time (with the same weight). This may be a decent impetus to drive that work.

Adding a new API increases nontrivial complexity to determine the factors that will go into a scheduling decision. We're going from 2 -> N (overlapping overlays) different CRs that will be factored in.

I'm not suggesting you're going the wrong direction with the current proposal, but I do think it's a foundational question that needs some analysis in the RFC.

Thank you for the thoughtful feedback! I've added a new "Alternative Proposal" section that explores the tradeoffs of integrating this functionality into the NodePool API.

The key consideration that led us away from the NodePool integration was the balance between API simplicity and operational flexibility. While incorporating these features into NodePool might seem simpler initially (one CRD vs. two), it would likely lead to NodePool proliferation as users create separate pools for different override scenarios. This not only increases operational complexity but also puts additional strain on our scheduler, which currently has limitations handling multiple NodePools with similar weights.

I agree that our current proposal of introducing a new CRD does add complexity to the scheduling decision process. However, we believe this complexity is justified by the benefits:

Better separation of concerns between selecting instance types (NodePool) and modifying instance type properties (NodeOverlay)

More flexible overlay stacking capabilities

Easier API evolution path

Simpler implementation of cross-cutting modifications

I've documented these tradeoffs in detail in the new section, including a sample of how the NodePool API extension might have looked.

Thanks for exploring this @engedaam! A few more thoughts on this.

I'm reminded of this exact discussion when we defined Disruption Budgets. We faced a similar opportunity to define NodeDisruptionBudgets as a separate CRD, which was really attractive given the flexibility it afforded (e.g. limit 1 disruption globally) and given the analogy to PDB. There were similar analogs for how to deal with overlapping NDBs, etc. Ultimately, we decided that the reliability and simplicity benefits outweighed the flexibility.

For 1 (Separation of concerns), I'd suggest that this is not a value prop in an of itself. Separation of concerns is a tool that you use to achieve an outcome -- I'd argue that 2-4 are outcomes of 1.

For 2 (Flexibility), I'd argue that you can achieve the exact same flexibility through multiple overlays within NodePool, e.g. NodePool.spec.overlays (properties? overrides?). Yes there is a 1MB limit that may impact extreme use cases. Yes the configuration would need to be duplicated as needed across multiple Node Pools, mitigated by client side templating (e.g. helm, cdk8s, kustomize, kro, etc) in the same way it's done for configuration duplication across many other Kubernetes APIs.

For 3 (Compatibility), API maturity is a huge value prop, but again I'd suggest that this can be achieved in NodePool. Kubernetes regularly supports alpha/beta fields in core APIs like Pod. I don't think Karpenter has done one of these yet, but in an of itself, I think it would be a win for the project to iron out what that looks like.

For 4 (Simplicity), I think we're at the crux of what makes me uneasy with the separate CRD approach. There is a significant reliability tradeoff to the simplicity that's suggested here.

NodeOverlays introduce cluster-wide blast radius. A user with permission to modify any NodeOverlay gains access to change scheduling calculations for every NodePool. For customers that treat NodePools as a tenancy boundary, this is a big risk vector. Before changing a NodeOverlay, you have to understand all NodePools and all NodeOverlays.

Many customers use NodePools to gradually roll out changes across a cluster. They trial a setting on a specific NodePool, and if things are going as expected, it's rolled out to more NodePools. This is especially important to get right if there's any dependency between the NodeOverlay and the NodePool (e.g. AMI settings).

It's straightforward to imagine cases where cluster-wide NodeOverlays break in surprising ways. Imagine two teams running on two NodePools using different AMI configurations. One team adds in huge-pages support all instance-types of a specific family, but the other team's AMIs aren't configured to support it. Imagine an overlay that changes the memory overhead calculations and all teams are using AL23. If someone wants to try Bottlerocket (different memory overhead) they'd have to go back and carve-out support for the new NodePool in the central NodeOverlay. As the number of of NodePools and NodeOverlays grows, this problem gets multiplicatively harder.

I suspect that most customers would need to implement (or enforce w/ a policy engine) a best practice of "NodeOverlay requirements should be scoped to a specific NodePool". At this point, wouldn't we be better off colocating this functionality in a single API?

Ultimately, I'm torn. For simple use cases, I like how minimal and expressive the separate CRD approach is. As configurations get more complicated, I worry that customers will rapidly lose the ability to reason about how layered/weighted overlays might apply to many NodePools in a cluster.

I'd love to hear perspectives from the customers and use cases that this RFC is targeting.

e.g.

kind: NodePool spec: template: requirements: - key: node.kubernetes.io/instance-type operator: In values: ["m5.large", "m5.2xlarge", "m5.4xlarge", "m5.8xlarge", "m5.12xlarge"] overlays: - capacity: smarter-devices/fuse: 1 - capacity: smarter-devices/fuse: 2 weight: 1 requirements: key: node.kubernetes.io/instance-type operator: In values: ["m5.12xlarge"]

Thank you for your thoughtful analysis and perspective. While I appreciate the concerns raised about potential complexity and reliability issues with a separate NodeOverlay CRD, I respectfully maintain that keeping this functionality separate from NodePool is the better approach. The flexibility and granular control offered by distinct NodeOverlays outweigh the potential drawbacks, in my view. It's crucial to recognize that the complexity of overlays extends beyond simple scenarios. There are use cases require modifying all instance types within a given NodePool, and maintaining this functionality within the NodePool itself would actually increase complexity for users IMHO. The separate NodeOverlay approach provides a more elegant solution for these broader modifications. Regarding blast radius and security concerns, these can be mitigated through careful RBAC policies and best practices documentation. The ability to apply overlays across multiple NodePools without duplicating configuration is a significant advantage, especially for large-scale or multi-team environments. Moreover, keeping NodeOverlays separate allows for more rapid iteration and feature development without impacting the core NodePool API.

It's important to consider organizations with centralized planning teams who manage infrastructure for multiple application teams. These central teams would benefit from the ability to apply consistent policies across various NodePools without the need to modify each one individually, which the separate NodeOverlay approach facilitates. While your points about gradual rollouts and potential conflicts between teams are valid, I believe these challenges can be addressed through proper planning, communication, and tooling rather than constraining the API design. Ultimately, the separate CRD approach provides a more scalable and adaptable solution for complex Kubernetes environments, particularly for organizations with centralized infrastructure management and those requiring broad instance type modifications.

I agree that additional customer feedback would be instrumental for the graduation of this API, however, I think we need teams to actually work with and implement this API in real-world scenarios to provide us with meaningful direction going forward.

Regarding blast radius and security concerns, these can be mitigated through careful RBAC policies and best practices documentation.

RBAC is not flexible to limit the blast radius problem I described. If a NodeOverlay can impact simulation for multiple NodePools, there's nothing RBAC can do. Even a policy engine won't be able to do anything beyond enforcing that NodeOverlays are scoped to NodePools.

Moreover, keeping NodeOverlays separate allows for more rapid iteration and feature development without impacting the core NodePool API.

As I say above, I don't think this is material. You can have new alpha CRDs or new alpha fields in a v1 CRD with similar compatibility guarantees.

I think we can agree that all functional use cases can be met with both approaches. It's really a question of ergonomics vs risk. Simply put, "Is it worth it to deduplicate scheduling configurations if it means increasing the risk of changes." Perhaps we can agree to disagree on this statement.

Luckily, this is an alpha, and Karpenter can always change course after a feedback period.

bwagner5 · 2025-04-28T16:13:01Z

designs/node-overlay.md

+
+### Example
+
+**Price Adjustment**: Define a pricing adjustment for all instance types option that fit the match labels field. User can set a scaler value or a percent


Can you give more examples to make price-adjustment clearer.

I'm guessing the current example means 90% of the OD price (ie 10% off)?

So a scaler would always be negative and look like -0.1 or something?

👍 you should explicitly state if it is an increase or decrease in price either with a sign (-/+) or a diff field for the sign

Yeah, I added a section on how exactly this would reflect and removed the notion of floating point values

designs/node-overlay.md

GnatorX · 2025-05-05T22:45:12Z

designs/node-overlay.md

+
+The integration of Node Overlay adjustment values and deeper observability mechanisms will be deferred until clear users requirements emerge. This decision is partly driven by concerns that excessive events or logs produced by some controllers could be highly noisy or negatively impact the API server. By deferring these features, we can focus on delivering core functionality and validation tooling first, while leaving room for future enhancements. These enhancements will be based on real-world usage patterns and specific users needs for additional visibility into Node Overlay operations.
+
+## Launch Scope


I think you should also add something on the NodeClaim that calls out the overlay used + the hash of the overlay

the hash of the overlay

What do you mean by the hash of the overlay?

We're currently deferring the implementation of overlay status reporting until we receive more customer feedback to better understand the most valuable approach. While NodeClaim is one potential option, we're also considering exposing overlay information through NodePool status or the NodeOverlay status itself, including details about which instance types had overlays applied. We're particularly mindful of the signal-to-noise ratio, especially if we implement event or log-based tracking. What kind of visibility would be most useful for your use case?

GnatorX · 2025-05-05T22:46:10Z

Thanks for tackling this problem with a new revision!

designs/node-overlay.md

jmdeal · 2025-06-04T16:16:05Z

designs/node-overlay.md

+### Example
+
+**Price Adjustment:** Define a pricing adjustment for instance types that match the specified labels. Users can adjust prices using either:
+- A signed integer representing the price adjustment in cents (e.g., 500 for $5.00)


Have we considered using a string representation for currency as well, that might be more natural than the number of cents (at least when using USD).

Can you elaborate more what you mean hear? Using a string representation get us away from having to use serialize and deserialize the floating values that would be passed, as the ultimate goal is to be focused on using integer values.

cents isn't granular enough. For example, a t4g.small on Spot is currently $0.0026

Hmm that's a fair point. Instead of using a float, I think we can just use a string a do a conversion within the controller.

sounds good!

designs/node-overlay.md

jmdeal · 2025-06-04T16:19:59Z

designs/node-overlay.md

+
+**Fail Open Approach (recommended):**
+
+* Using alphabetical ordering or validate last applied resource to resolve conflicts when weights are equal


What does "validate last applied resource" mean here? Are you able to elaborate?

I dropped that part. The idea there would be that Karpenter would be able to understand change between node overlays to determine if we should the current overlay or block the update to the overlay due to a conflict

jmdeal · 2025-06-04T16:27:09Z

designs/node-overlay.md

+**Fail Open Approach (recommended):**
+
+* Using alphabetical ordering or validate last applied resource to resolve conflicts when weights are equal
+* Setting overlay Ready status condition to false when conflicts occur


What happens when you have a NodeOverlay which results in a conflict for one instance type but not another? Consider this scenario:

NodeOverlay A + B applies to instance type A (No Confilicts)

NodeOverlay C + B applies to instance type B (Conflict)

In this example, NodeOverlay B is the lower priority overlay in both cases. I imagine we'd still want to use it for instance type A and not B, but I think the recommendation suggests we wouldn't want to use it for either. I think this approach simplifies a fair bit, but I also think it will be surprising when applying an overlay which results in a conflict for InstanceType B also causes changes for InstanceType A. This isn't something that necessarily needs to block this v1alpha1 revision, but is something we can iterate on and get feedback on.

In the current proposal, we would apply the overlay based on the alphabetical ordering. So in the scenario that you have outline, both A and B overlay would be applied and node overlay C would not be applied. The recommended approach will give the case you have outlined.

jmdeal · 2025-06-04T16:28:59Z

designs/node-overlay.md

+* Can be implemented at two levels:
+
+    1. Global: Stops all provisioning across the cluster
+    2. NodePool-specific: Only affects misconfigured NodePools


I am curious how this would work given that NodeOverlays aren't directly linked to NodePools. Would this affect any NodePool which has overlapping instance type selection with the impacted NodeOverlay?

This would resolve to all the nodepools that the overlay's apply. This would either be through the requirements or understand the instance types that match.

How would someone use this in a Reserved Instances (RIs) setup if the Overlay isn't tied to the NodePool?

For RIs, I'd want a weighted NodePool with a limit based on how many RIs I have. The InstanceTypes launched by that NodePool should be cheaper. But other NodePools that launch those Instance Types should not as cheap. Consolidation should also understand the differences between prices of the same instance-type capacity-type

Yeah, that would be the current expectation as to how user would us reserved instances, and compute saving plans. The interesting aspect here is that there is a follow-up that we can add to overlays that say only apply the overlay to certain number of instances. That way customer can collapse their nodepools and have reserved instances and normal priced instances in one nodepool.

jmdeal · 2025-06-04T17:00:12Z

designs/node-overlay.md

+    1. Global: Stops all provisioning across the cluster
+    2. NodePool-specific: Only affects misconfigured NodePools
+
+* Similar to Karpenter's existing behavior with .spec.limits and InsufficientCapacityErrors


How is this comparable to ICE errors?

Similar to hitting capacity limits, this ICEd instance types today precise control over provisioning by allowing Karpenter to halt operations either across entire nodepools or for specific instance types affecting all nodepools. The solution provides flexible management of Karpenter's scheduling behavior, ensuring that resource allocation strictly adheres to intended configurations while preventing unplanned provisioning outside defined parameters.

Beardface123 · 2025-07-01T18:30:52Z

Hello all, I wanted to provide a short consumer perspective on this proposed feature.

Our organization has been in demand for several changes that would benefit from the proposed NodeOverlay feature.

We're exploring nodepools to use ARM, with at least one nodepool requiring the fuse-plugin. (We're currently using a very undesirable workaround for this and are eagerly awaiting a solution for this in Karpenter).
Cost is extremely important so any ability to overlay savings plans on top of particular instances for better node allocations is great.
We're daemonset heavy, so including licensing costs of these daemonsets where applicable would be very helpful.

We're still exploring where new nodepools would be necessary, but if these features were to be implemented directly into the NodePool API I fear it would near exponentially increase the number of node pools we'd need to maintain as Karpenter admins. I would welcome the ease of an overlay, especially one that would target specific instances through a familiar spec. I would accept this increased blast radius and put it on our teams to ensure proper testing is done in case poor changes are performed on an overlay. Just an opinion of one Karpenter admin of course. 👍

In short, it's fantastic that Karpenter can smartly chose the right instance for the demand, but we're starting to see some shortcomings when it comes to more complex logic. An optional API to implement complex logic on a subset of nodes in a given nodepool seems like a good potential path forward

In transparency, we are also looking for the ability to dynamically adjust EBS throughput and IOPS depending on instance size, but I don't think NodeOverlay would apply here since this is part of the EC2NodeClass.

ellistarn · 2025-07-01T20:50:38Z

but if these features were to be implemented directly into the NodePool API I fear it would near exponentially increase the number of node pools we'd need to maintain as Karpenter admins.

Great feedback @Beardface123! Can you expand a little bit on why you'd imagine this would increase the number of NodePools? I initially thought this, but realized an alternative path where .spec.overlays (plural) would let you do complex overrides within a single nodepool. The downside being that if you wanted this config everywhere you'd need to duplicate it across nodepools.

Your comment here seems to suggest that you would scope these overrides to a nodepool anyways:

An optional API to implement complex logic on a subset of nodes in a given nodepool

It's a bit nuanced, so I'm very curious to your thinking on this.

Beardface123 · 2025-07-02T03:40:11Z

Can you expand a little bit on why you'd imagine this would increase the number of NodePools?

What I wanted to avoid is an implementation that will require NodePools to propagate out exponentially.

Given an example of a general node pool:

requires amd and arn architectures and the arn nodes need a fuse-plugin custom resource
a wide range of sizes from 2xlarge to 16xlarge. The idea of adding estimated daemonset overhead to these nodes is appealing, but sometimes licensing varies by total CPU for the node. The flat cost adjustment may vary for every instance size.

Given the scenario, we might need to take a single NodePools, split it to two to support the fuse-plugin, and then multiply by 5 for each instance size available. This is what I had in mind in my first post. I think I would be accepting of several implementations as long as it keeps the number of NodePools we manage under control. Also, upon re-reading the thread from a month ago, this might not be what was intended at all. 🙃

Your comment here seems to suggest that you would scope these overrides to a nodepool anyways

I didn't express myself correctly, sorry about that. To correct myself I should've said "An optional API to implement complex logic on a subset of nodes across specified NodePools"

If I were to try to represent my opinions concisely in a few points now that I've re-read the threads and given it another round of thought:

I think the NodeOverlay API object is good idea, because it will create re-useable overlays that could be applied to multiple NodePools.
In my opinion NodeOverlay API objects should either have an opt-out concept in the NodePool or an opt-in which would somewhat blend its concept with your proposed solution of .spec.overlay within the NodePool. I prefer the latter.
It's up to people smarter than me, but if a NodeOverlay could be referenced in a NodePool, it could be an ordered list. This might invalidate the need for a NodeOverlay weight since different nodepools might have different priorities on overlays.

Ultimately, the overlay concept in general is very appealing in either implementation, but I think the preference is to use a separate API object to define overlays and have that propagate to all nodepools that reference it. The alternative is a more cumbersome experience if there are many NodePools with diverse overlays. If manually set, I'd worry about user error. If templating overlays with Helm or other tools is done, well that's time my team needs to figure it out.

Again, sorry for the misunderstanding up front.

gillg · 2025-07-03T10:01:55Z

Out of curiosity where happens the Karpenter logic and the NodeOverlay in this schema ?
Does it will fit properly with a special scheduler extension + device plugin ?

                                                                        +----------------------------+
                                                                        | POD Manifest               |
                                                                        | with Request               |
                                                                        | aws.amazon.com/neuroncore:2|
                                                                        |                            |
                                                                        |                            |
                                                    2                   +-------------+--------------+
                                         +--------------------------------+           |
                                         |                                |           |
                                         |                                |           | 3
          +------------------------------+-----+                          |           |
          |           Kubelet in INF1/TRN1 Node|                          |           |
          |                                    +<-----------+             |           |
          +-----+---------------------+--------+            |       +-----v-----------v--------------+
                |                     ^                     |       |          Kube-Scheduler        |
                |                     |                     |       |                                |
                |                     |                     |       +--^------+---------------+------+
              9 |                  1  |                     |          |      |               |
                |                     |                    8|         5|      |4              |
                |                     |                     |          |      |               |
                |                     |                     |          |      |               |6
                v                     |                     |          |      |               |
          +-----+---------------------+--------+            |       +--+------v---------------v------+
          |    neuron-device-plugin            |            +-------+       neuron|scheduler|ext     |
          |    in INF1/TRN1 node               |                    +---------------------+----------+
          +----+----------------------+--------+                                          |
               |                      |                                                   |7
               |                      |10                                                 |
               |                      |                                                   v
             11|                      |                                         +---------+-------+
               |                      |                                         |POD Manifest:    |
               |                      |                                         |Annotation:      |
               |                      |                                         |NEURON_CORES:2,3 |
               v                      +---------------------------------------->+                 |
--device=/dev/neuron1 --env NEURON_RT_VISIBLE_CORES=2,3                         |                 |
                                                                                |                 |
                                                                                +-----------------+

1. neuron-device-plugin returns the list of Neuron cores/devices to kublet
2. Kubelet advertises the Core/Device list to K8s API server (in turn to kube-scheduler)
3. POD Request for neuron cores/devices [Kube-Scheduler picks up the POD creation request]
4. kube-scheduler calls the neuron-scheduler-extn filter function with list of nodes and POD Specification
5. neuron-scheduler-extn scans through the nodes and filters out nodes with non
contiguous cores/devices and returns the nodes that are capable of supporing the given POD specification
6. kube-scheduler calls the neuron-scheduler-extn bind function with pod and node
7. neuron-scheduler-extn updates the POD annotation with allocated neuron core/device Ids (contiguous)
8. neuron-scheduler-extn sends the bind request to kubelet of the selected node
9. Kubelet calls the Alloc function of the neuron-device-plugin
10. neuron-device-plugin queries the POD Annotation for allocated core/device Ids
11. neuron-device-plugin exports the devices & visisble cores to container runtime

Source: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-scheduler-flow.html#k8s-neuron-scheduler-flow

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 25, 2025

k8s-ci-robot requested review from gjtempleton and tallaxes April 25, 2025 19:28

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 25, 2025

engedaam mentioned this pull request Apr 25, 2025

RFC: NodeOverlay #1305

Closed

engedaam changed the title ~~RFC: Node Overlay~~ docs: Node Overlay RFC Apr 25, 2025

engedaam force-pushed the node-overlay-rfc branch 3 times, most recently from ce506d2 to c155548 Compare April 25, 2025 22:22

ellistarn reviewed Apr 27, 2025

View reviewed changes

dm3ch mentioned this pull request Apr 28, 2025

Enhancement: Implement retrieval and calculation of prices info using GCP CloudBilling API cloudpilot-ai/karpenter-provider-gcp#33

Open

bwagner5 reviewed Apr 28, 2025

View reviewed changes

designs/node-overlay.md Show resolved Hide resolved

engedaam force-pushed the node-overlay-rfc branch from d3ef8d2 to 1987b1c Compare April 30, 2025 00:08

rschalo mentioned this pull request May 5, 2025

Add support for custom max spot prices aws/karpenter-provider-aws#7899

Closed

3 tasks

GnatorX reviewed May 5, 2025

View reviewed changes

designs/node-overlay.md Show resolved Hide resolved

GnatorX reviewed May 5, 2025

View reviewed changes

rschalo mentioned this pull request May 8, 2025

Node disk size based on a formula of CPUs aws/karpenter-provider-aws#8023

Closed

flavono123 reviewed May 22, 2025

View reviewed changes

designs/node-overlay.md Outdated Show resolved Hide resolved

engedaam force-pushed the node-overlay-rfc branch 4 times, most recently from fa43c6a to e2e038b Compare June 4, 2025 15:46

jmdeal reviewed Jun 4, 2025

View reviewed changes

engedaam force-pushed the node-overlay-rfc branch 3 times, most recently from 9ce09f9 to 72592c5 Compare June 10, 2025 16:22

Add Node Overlay RFC

2241bf1

engedaam force-pushed the node-overlay-rfc branch from 72592c5 to 2241bf1 Compare June 10, 2025 18:44

This was referenced Jun 10, 2025

feat: Add the Node Overlay CRD #2296

Open

First class support for Instance Savings Plans aws/karpenter-provider-aws#8173

Open

feat: Add NodeOverlay Validation Controller #2306

Draft


		### Example

		Price Adjustment: Define a pricing adjustment for all instance types option that fit the match labels field. User can set a scaler value or a percent


		The integration of Node Overlay adjustment values and deeper observability mechanisms will be deferred until clear users requirements emerge. This decision is partly driven by concerns that excessive events or logs produced by some controllers could be highly noisy or negatively impact the API server. By deferring these features, we can focus on delivering core functionality and validation tooling first, while leaving room for future enhancements. These enhancements will be based on real-world usage patterns and specific users needs for additional visibility into Node Overlay operations.

		## Launch Scope


		Fail Open Approach (recommended):

		* Using alphabetical ordering or validate last applied resource to resolve conflicts when weights are equal

docs: Node Overlay RFC #2166

Are you sure you want to change the base?

docs: Node Overlay RFC #2166

Conversation

engedaam commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Apr 25, 2025

Uh oh!

coveralls commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 15567714722

Details

💛 - Coveralls

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GnatorX May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ellistarn Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ellistarn Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

GnatorX May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GnatorX commented May 5, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

engedaam Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

engedaam commented Apr 25, 2025 •

edited

Loading

coveralls commented Apr 25, 2025 •

edited

Loading

GnatorX May 5, 2025 •

edited

Loading

ellistarn Apr 27, 2025 •

edited

Loading

ellistarn Jun 4, 2025 •

edited

Loading

GnatorX May 5, 2025 •

edited

Loading

engedaam Jun 10, 2025 •

edited

Loading

engedaam Jun 10, 2025 •

edited

Loading