Skip to content

feat: support auto relaxing min values #2299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

saurav-agarwalla
Copy link
Contributor

@saurav-agarwalla saurav-agarwalla commented Jun 11, 2025

Fixes #2078

Description

Support auto-relaxing min values when using min values would lead to a scheduling error.

TODO:

  • More testing (including unit tests)
  • Figuring out a way to add conditional validation for min values since the CEL validation needs to be removed (CEL doesn't support conditionally validating this based on an annotation key)

How was this change tested?

Unit testing, more testing in progress.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 11, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: saurav-agarwalla
Once this PR has been reviewed and has the lgtm label, please assign tzneal for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 11, 2025
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 12, 2025
@saurav-agarwalla saurav-agarwalla changed the title [WIP] feat: support auto relaxing min values feat: support auto relaxing min values Jun 12, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2025
@coveralls
Copy link

coveralls commented Jun 12, 2025

Pull Request Test Coverage Report for Build 16010551992

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 46 of 49 (93.88%) changed or added relevant lines in 7 files are covered.
  • 213 unchanged lines in 16 files lost coverage.
  • Overall coverage decreased (-0.08%) to 81.964%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/provisioning/scheduling/nodeclaim.go 20 21 95.24%
pkg/operator/options/options.go 4 6 66.67%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/disruption/drift.go 2 91.84%
pkg/controllers/disruption/consolidation.go 3 90.4%
pkg/controllers/disruption/singlenodeconsolidation.go 4 93.62%
pkg/controllers/disruption/emptiness.go 5 87.3%
pkg/controllers/state/statenode.go 5 87.05%
pkg/test/expectations/expectations.go 8 93.52%
pkg/controllers/controllers.go 9 0.0%
pkg/controllers/disruption/multinodeconsolidation.go 10 86.76%
pkg/test/ratelimitinginterface.go 10 0.0%
pkg/controllers/disruption/helpers.go 11 87.43%
Totals Coverage Status
Change from base Build 15596932737: -0.08%
Covered Lines: 10234
Relevant Lines: 12486

💛 - Coveralls

@@ -44,6 +44,18 @@ var (
CapacityTypeLabel,
},
)
NodeClaimsCreatedWithMinValuesAutoRelaxedTotal = opmetrics.NewPrometheusCounter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make this metric more generic? Like something about preference relaxation? This is super super specific to this change and I don't want to explode the number of metrics that we have as we start adding more config and more features

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this makes even more sense with the switch to a policy type instead of the feature flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost don't think that you need the metric at all -- given this setting is so specific, what's the value of surfacing this metric? If you have the annotation, someone can use something like kube-state to capture the number of nodes with relaxed preferences at any given point in time

@@ -37,7 +37,6 @@ type NodeClaimSpec struct {
// Requirements are layered with GetLabels and applied to every node.
// +kubebuilder:validation:XValidation:message="requirements with operator 'In' must have a value defined",rule="self.all(x, x.operator == 'In' ? x.values.size() != 0 : true)"
// +kubebuilder:validation:XValidation:message="requirements operator 'Gt' or 'Lt' must have a single positive integer value",rule="self.all(x, (x.operator == 'Gt' || x.operator == 'Lt') ? (x.values.size() == 1 && int(x.values[0]) >= 0) : true)"
// +kubebuilder:validation:XValidation:message="requirements with 'minValues' must have at least that many values specified in the 'values' field",rule="self.all(x, (x.operator == 'In' && has(x.minValues)) ? x.values.size() >= x.minValues : true)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that we want to drop this requirement. It feels like if we are relaxing the minValues, we should also change the minValues requirement that we have associated with that NodeClaim -- that way we still maintain the requirement being passed onto the NodeClaim AND we don't have to remove the validation that we added here in CEL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my response to relaxing the requirements to another comment. We can keep this validation if we do that but I tried multiple ways to see if this can be validated on the basis of the annotation that I added but K8s doesn't seem to allow that.

Depending on what we decide for relaxing the requirements, I can update this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing offline, we decided that we will allow the requirement for minValues to be relaxed so that we can have the NodeClaim have minValues requirements that match the number of values that will be passed through from the scheduler.

This will ensure that NodeClaims continue to be a standalone concept and make sense in isolation (outside the context of the scheduler) -- if we were to go the other way, we would have to write special code on the launch path that would create an exception for values not satisfying minValues if we know that the minValuesPolicy is set in the environment variables

@@ -53,6 +53,7 @@ const (
NodePoolHashAnnotationKey = apis.Group + "/nodepool-hash"
NodePoolHashVersionAnnotationKey = apis.Group + "/nodepool-hash-version"
NodeClaimTerminationTimestampAnnotationKey = apis.Group + "/nodeclaim-termination-timestamp"
NodeClaimMinValuesAutoRelaxedAnnotationKey = apis.Group + "/nodeclaim-min-values-auto-relaxed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why we need this annotation? You can't query on annotations so I'm curious how we expect this being added to the NodeClaim to be helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned this somewhere else but mostly so that it is easy for customers to know when/why a nodeclaim isn't being launched with the min values constraint. Open to other ways of doing this if you have any suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could really go either way on this annotation -- it doesn't seem strictly necessary since we will be updating the requirements directly but I can see some use-cases for it with respect to observability -- in general, I think it's fine to keep it.

@@ -581,6 +589,11 @@ func (s *Scheduler) addToNewNodeClaim(ctx context.Context, pod *corev1.Pod) erro
if i >= idx {
return false
}

if minValuesAutoRelaxed {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think we need to expose this externally? If we do, does it make sense to put this as a property of the requirement vs. exposing it as a return value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to show visibility to the customer in case there were nodeclaims being launched with relaxed min values. Would definitely prefer for this to be in the requirement but trying to keep API changes to a minimum for now. I think once we see more adoption for this feature, we can expose both the policy flag and this field in the APIs.

My worry is once we expand the API, it is relatively hard to deprecate a field if we see that this isn't a widely popular feature.

What do you think?

@@ -419,14 +420,18 @@ func filterInstanceTypesByRequirements(instanceTypes []*cloudprovider.InstanceTy
// We don't care about the minimum number of instance types that meet our requirements here, we only care if they meet our requirements.
_, err.minValuesIncompatibleErr = remaining.SatisfiesMinValues(requirements)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should perhaps consider returning a modified requirement set with modified minValues based on the result of the relaxation and the remaining instance types attached to the NodeClaim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this initially but decided against it because it seemed to me that it'd complicate the outcome without a lot of value add (I could be wrong though). The way we evaluate min values today is we return on the first requirement for which min values is violated even when there are multiple requirements whose min values would be violated for a particular scheduling decision. With relaxation, how do we decide which one to relax in case there is more than 1? It seemed a little odd to relax multiple requirements. Which is why the way I am thinking of this is relaxation isn't restricted to a particular requirements but the overall scheduling decision.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline: We decided to update the requirements that we are putting in the NodeClaim so that the minValues stays consistent with the count of values that we are passing through to the NodeClaim

@@ -112,7 +121,8 @@ func (o *Options) AddFlags(fs *FlagSet) {
fs.DurationVar(&o.BatchMaxDuration, "batch-max-duration", env.WithDefaultDuration("BATCH_MAX_DURATION", 10*time.Second), "The maximum length of a batch window. The longer this is, the more pods we can consider for provisioning at one time which usually results in fewer but larger nodes.")
fs.DurationVar(&o.BatchIdleDuration, "batch-idle-duration", env.WithDefaultDuration("BATCH_IDLE_DURATION", time.Second), "The maximum amount of time with no new pending pods that if exceeded ends the current batching window. If pods arrive faster than this time, the batching window will be extended up to the maxDuration. If they arrive slower, the pods will be batched separately.")
fs.StringVar(&o.preferencePolicyRaw, "preference-policy", env.WithDefaultString("PREFERENCE_POLICY", string(PreferencePolicyRespect)), "How the Karpenter scheduler should treat preferences. Preferences include preferredDuringSchedulingIgnoreDuringExecution node and pod affinities/anti-affinities and ScheduleAnyways topologySpreadConstraints. Can be one of 'Ignore' and 'Respect'")
fs.StringVar(&o.FeatureGates.inputStr, "feature-gates", env.WithDefaultString("FEATURE_GATES", "NodeRepair=false,ReservedCapacity=false,SpotToSpotConsolidation=false"), "Optional features can be enabled / disabled using feature gates. Current options are: NodeRepair, ReservedCapacity, and SpotToSpotConsolidation")
fs.StringVar(&o.relaxationPolicyRaw, "relaxation-policy", env.WithDefaultString("RELAXATION_POLICY", string(RelaxationPolicyDefault)), "Relaxation policy for scheduling. Options include 'Default' for existing behavior or 'RelaxWhenMinValuesUnsatisfiable' where Karpenter relaxes min values when it isn't satisfied.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel like this should just be called the MIN_VALUES_POLICY or something like that -- relaxation policy doesn't feel specific enough to suggest that it's specifically applying to the minValues portion of our API. To me, this is something like MIN_VALUES_POLICY=Strict|BestEffort

truncatedInstanceTypes := lo.Slice(its.OrderByPrice(requirements), 0, maxItems)
// Only check for a validity of NodeClaim if its requirement has minValues in it.
if requirements.HasMinValues() {
if _, err := truncatedInstanceTypes.SatisfiesMinValues(requirements); err != nil {
// If minValues is NOT met for any of the requirement across InstanceTypes, then still allow it if relaxation is enabled.
if _, err := truncatedInstanceTypes.SatisfiesMinValues(requirements); err != nil && options.FromContext(ctx).RelaxationPolicy != options.RelaxationPolicyRelaxMinValuesWhenUnsatisfiable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd probably go the other way with also checking that this value is set to something like "Strict" since that's more intentionally checking what you want here (e.g. if we require that minValues is set and we don't satisfy it)

It's also not a huge deal here though because we only have two options to select from

@@ -44,6 +44,19 @@ var (
CapacityTypeLabel,
},
)
NodeClaimsCreatedWithRelaxedPreferencesTotal = opmetrics.NewPrometheusCounter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't know if I see a point for this metric to exist -- worst-case it seems like you could just add a label to the existing NodeClaimsCreated metric, but I'm not sure I even think that is too valuable here to have if you have an annotation on the NodeClaim

@@ -549,7 +550,7 @@ func (s *Scheduler) addToNewNodeClaim(ctx context.Context, pod *corev1.Pod) erro
}
}
nodeClaim := NewNodeClaim(s.nodeClaimTemplates[i], s.topology, s.daemonOverhead[s.nodeClaimTemplates[i]], s.daemonHostPortUsage[s.nodeClaimTemplates[i]], its, s.reservationManager, s.reservedOfferingMode)
r, its, ofs, err := nodeClaim.CanAdd(ctx, pod, s.cachedPodData[pod.UID])
r, its, ofs, minValuesRelaxed, err := nodeClaim.CanAdd(ctx, pod, s.cachedPodData[pod.UID], karpopts.FromContext(ctx).RelaxationPolicy == karpopts.RelaxationPolicyRelaxMinValuesWhenUnsatisfiable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just pull the relaxationPolicy from the scheduler struct directly?

@@ -132,7 +133,7 @@ func NewScheduler(
// Pre-filter instance types eligible for NodePools to reduce work done during scheduling loops for pods
templates := lo.FilterMap(nodePools, func(np *v1.NodePool, _ int) (*NodeClaimTemplate, bool) {
nct := NewNodeClaimTemplate(np)
nct.InstanceTypeOptions, _ = filterInstanceTypesByRequirements(instanceTypes[np.Name], nct.Requirements, corev1.ResourceList{}, corev1.ResourceList{}, corev1.ResourceList{})
nct.InstanceTypeOptions, _, _ = filterInstanceTypesByRequirements(instanceTypes[np.Name], nct.Requirements, corev1.ResourceList{}, corev1.ResourceList{}, corev1.ResourceList{}, karpopts.FromContext(ctx).RelaxationPolicy == karpopts.RelaxationPolicyRelaxMinValuesWhenUnsatisfiable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Same comment here -- can you assign this as a member value on the scheduler and then just pull it from that member value? In general, I'm trying to keep option passthroughs as explicit as possible and trying to prevent us from pulling details out from the context arbitrarily in the scheduler code so the dependencies are clear here and the scheduler can be used as a library

@@ -419,14 +420,18 @@ func filterInstanceTypesByRequirements(instanceTypes []*cloudprovider.InstanceTy
// We don't care about the minimum number of instance types that meet our requirements here, we only care if they meet our requirements.
_, err.minValuesIncompatibleErr = remaining.SatisfiesMinValues(requirements)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline: We decided to update the requirements that we are putting in the NodeClaim so that the minValues stays consistent with the count of values that we are passing through to the NodeClaim

// Check Taints
if err := scheduling.Taints(n.Spec.Taints).ToleratesPod(pod); err != nil {
return nil, nil, nil, err
return nil, nil, nil, minValuesRelaxed, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, nil, nil, minValuesRelaxed, err
return nil, nil, nil, false, err

nit: I could see you returning explicit false here but I think it's probably fine either way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support whenMinValuesUnsatisfiable to handle flexibility as a preference
5 participants