Skip to content

Conversation

@moko-poi
Copy link
Contributor

@moko-poi moko-poi commented Nov 3, 2025

Description

This is a proposal to add support for the standard Kubernetes CapacityBuffer API (autoscaling.x-k8s.io/v1alpha1) to enable pre-provisioned spare capacity in
Karpenter clusters.

The RFC introduces a virtual pod approach that integrates buffer capacity into Karpenter's scheduling and consolidation algorithms while maintaining compatibility
with the existing Cluster Autoscaler Buffer API.

Related issue: #2571

How was this change tested?

RFC only - implementation will follow in subsequent PRs

Key Features

  • Standard CapacityBuffer CRD support
  • Virtual pod generation for buffer capacity
  • Integration with Karpenter's NodeClaim-based architecture
  • Consolidation protection for buffer capacity
  • Cross-autoscaler compatibility (Cluster Autoscaler → Karpenter migration)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: moko-poi
Once this PR has been reviewed and has the lgtm label, please assign njtran for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 3, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @moko-poi. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 3, 2025
@coveralls
Copy link

Pull Request Test Coverage Report for Build 19022966184

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.09%) to 81.706%

Files with Coverage Reduction New Missed Lines %
pkg/controllers/node/termination/controller.go 2 77.14%
Totals Coverage Status
Change from base Build 18947767733: 0.09%
Covered Lines: 11581
Relevant Lines: 14174

💛 - Coveralls

@@ -0,0 +1,594 @@
# Capacity Buffer API Support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the design! Couple high level notes:

  • I don't think the capacity buffer api exists yet. The sig-autoscaling RFC was merged but I don't think the api itself has been released. Doesn't mean we can't think ahead, but implementation for this will have to wait until the api exists.
  • Overall I think the doc is a bit messy. I think it would be a stronger proposal if you started from the CX and derived implementation from that. Similarly, I think the implementation section could be much stronger if it started from the requirements of the existing controllers and worked backwards to the API's that the capacity buffer controller should provide.

- Using pause containers with resource requests to reserve capacity
- Over-provisioning through static NodePools

The Kubernetes SIG Autoscaling has standardized a CapacityBuffer API to declare spare capacity/headroom in clusters. Cluster Autoscaler supports this API (autoscaling.x-k8s.io/v1alpha1), providing a vendor-agnostic way to express capacity requirements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure CAS has support for that API yet

1. **Performance-critical applications** where just-in-time provisioning latency is unacceptable
2. **Burst workloads** that need immediate scheduling for CI/CD, batch jobs, or event-driven applications
3. **High-availability services** that require buffer capacity to handle traffic spikes or node failures
4. **Consistent user experience** across different autoscaling solutions in the Kubernetes ecosystem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we care all that much about consistent UX. In fact, the two autoscaling solutions work very differently. I do think we could say we care about intent driven configuration though


## Proposal

Extend Karpenter to support the standard CapacityBuffer API (autoscaling.x-k8s.io/v1alpha1) by integrating buffer capacity into scheduling and consolidation algorithms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the API is alpha, whatever design we create should include the standard set of alpha protections we use. I don't see where you've discussed feature gating this and the opt-in opt-out behavior, but the RFC should include details on that

```

Key aspects:
1. **Virtual Pod Approach**: Follow Cluster Autoscaler's pattern using in-memory virtual pods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reduce this to a single goal, something along the lines of 'Karpenter respects configured CapacityBuffers, maintaining additional capacity as if they were pods'. 1-3 in this list are implementation details that get us towards that goal


5. **Graceful Degradation**: If buffer capacity cannot be maintained, prioritize user workloads and log buffer capacity warnings

### API Integration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is repeated


**Revised Protection Strategy**:

1. **NodeClaim-Level Tracking**: Buffer capacity is tracked at the NodeClaim level, not just pod level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. What are you trying to say with this?

5. **Update buffer status** with translation results
6. **Inject virtual pods** into Karpenter's scheduling pipeline

### Implementation Phases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this section enumerate what requirements must be met before implementation can begin, and then what all functionality is required for the alpha release of buffer support?

- Memory overhead of virtual pods
- Watch performance with many buffers

## Migration & Compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is missing a feature flag discussion

**A**: Follow Karpenter's provisioning behavior - create suitable NodeClaims through scheduler constraint solving

3. **Q**: How does buffer capacity interact with NodePool limits?
**A**: Buffer NodeClaims must respect NodePool resource limits and budget constraints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, what is a buffer nodeclaim?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants