feat: support auto relaxing min values #2299

saurav-agarwalla · 2025-06-11T23:33:54Z

Fixes #2078

Description

Support auto-relaxing min values when using min values would lead to a scheduling error.

TODO:

More testing (including unit tests)
Figuring out a way to add conditional validation for min values since the CEL validation needs to be removed (CEL doesn't support conditionally validating this based on an annotation key)

How was this change tested?

Unit testing, more testing in progress.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

k8s-ci-robot · 2025-06-11T23:34:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: saurav-agarwalla
Once this PR has been reviewed and has the lgtm label, please assign tzneal for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coveralls · 2025-06-12T04:42:03Z

Pull Request Test Coverage Report for Build 16010551992

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

46 of 49 (93.88%) changed or added relevant lines in 7 files are covered.
213 unchanged lines in 16 files lost coverage.
Overall coverage decreased (-0.08%) to 81.964%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/controllers/provisioning/scheduling/nodeclaim.go	20	21	95.24%
pkg/operator/options/options.go	4	6	66.67%

Files with Coverage Reduction	New Missed Lines	%
pkg/controllers/disruption/drift.go	2	91.84%
pkg/controllers/disruption/consolidation.go	3	90.4%
pkg/controllers/disruption/singlenodeconsolidation.go	4	93.62%
pkg/controllers/disruption/emptiness.go	5	87.3%
pkg/controllers/state/statenode.go	5	87.05%
pkg/test/expectations/expectations.go	8	93.52%
pkg/controllers/controllers.go	9	0.0%
pkg/controllers/disruption/multinodeconsolidation.go	10	86.76%
pkg/test/ratelimitinginterface.go	10	0.0%
pkg/controllers/disruption/helpers.go	11	87.43%

Totals
Change from base Build 15596932737:	-0.08%
Covered Lines:	10234
Relevant Lines:	12486

💛 - Coveralls

pkg/test/options.go

jonathan-innis · 2025-06-20T22:21:27Z

pkg/metrics/metrics.go

@@ -44,6 +44,18 @@ var (
 			CapacityTypeLabel,
 		},
 	)
+	NodeClaimsCreatedWithMinValuesAutoRelaxedTotal = opmetrics.NewPrometheusCounter(


Is there a way to make this metric more generic? Like something about preference relaxation? This is super super specific to this change and I don't want to explode the number of metrics that we have as we start adding more config and more features

Yup, this makes even more sense with the switch to a policy type instead of the feature flag.

I almost don't think that you need the metric at all -- given this setting is so specific, what's the value of surfacing this metric? If you have the annotation, someone can use something like kube-state to capture the number of nodes with relaxed preferences at any given point in time

jonathan-innis · 2025-06-20T22:24:17Z

pkg/apis/v1/nodeclaim.go

@@ -37,7 +37,6 @@ type NodeClaimSpec struct {
 	// Requirements are layered with GetLabels and applied to every node.
 	// +kubebuilder:validation:XValidation:message="requirements with operator 'In' must have a value defined",rule="self.all(x, x.operator == 'In' ? x.values.size() != 0 : true)"
 	// +kubebuilder:validation:XValidation:message="requirements operator 'Gt' or 'Lt' must have a single positive integer value",rule="self.all(x, (x.operator == 'Gt' || x.operator == 'Lt') ? (x.values.size() == 1 && int(x.values[0]) >= 0) : true)"
-	// +kubebuilder:validation:XValidation:message="requirements with 'minValues' must have at least that many values specified in the 'values' field",rule="self.all(x, (x.operator == 'In' && has(x.minValues)) ? x.values.size() >= x.minValues : true)"


I'm not sure that we want to drop this requirement. It feels like if we are relaxing the minValues, we should also change the minValues requirement that we have associated with that NodeClaim -- that way we still maintain the requirement being passed onto the NodeClaim AND we don't have to remove the validation that we added here in CEL

See my response to relaxing the requirements to another comment. We can keep this validation if we do that but I tried multiple ways to see if this can be validated on the basis of the annotation that I added but K8s doesn't seem to allow that.

Depending on what we decide for relaxing the requirements, I can update this.

After discussing offline, we decided that we will allow the requirement for minValues to be relaxed so that we can have the NodeClaim have minValues requirements that match the number of values that will be passed through from the scheduler.

This will ensure that NodeClaims continue to be a standalone concept and make sense in isolation (outside the context of the scheduler) -- if we were to go the other way, we would have to write special code on the launch path that would create an exception for values not satisfying minValues if we know that the minValuesPolicy is set in the environment variables

jonathan-innis · 2025-06-20T22:24:51Z

pkg/apis/v1/labels.go

@@ -53,6 +53,7 @@ const (
 	NodePoolHashAnnotationKey                  = apis.Group + "/nodepool-hash"
 	NodePoolHashVersionAnnotationKey           = apis.Group + "/nodepool-hash-version"
 	NodeClaimTerminationTimestampAnnotationKey = apis.Group + "/nodeclaim-termination-timestamp"
+	NodeClaimMinValuesAutoRelaxedAnnotationKey = apis.Group + "/nodeclaim-min-values-auto-relaxed"


I'm curious why we need this annotation? You can't query on annotations so I'm curious how we expect this being added to the NodeClaim to be helpful

Mentioned this somewhere else but mostly so that it is easy for customers to know when/why a nodeclaim isn't being launched with the min values constraint. Open to other ways of doing this if you have any suggestions.

I could really go either way on this annotation -- it doesn't seem strictly necessary since we will be updating the requirements directly but I can see some use-cases for it with respect to observability -- in general, I think it's fine to keep it.

pkg/controllers/provisioning/suite_test.go

jonathan-innis · 2025-06-20T22:35:09Z

pkg/controllers/provisioning/scheduling/scheduler.go

@@ -581,6 +589,11 @@ func (s *Scheduler) addToNewNodeClaim(ctx context.Context, pod *corev1.Pod) erro
 		if i >= idx {
 			return false
 		}
+
+		if minValuesAutoRelaxed {


Do we think we need to expose this externally? If we do, does it make sense to put this as a property of the requirement vs. exposing it as a return value?

I think we want to show visibility to the customer in case there were nodeclaims being launched with relaxed min values. Would definitely prefer for this to be in the requirement but trying to keep API changes to a minimum for now. I think once we see more adoption for this feature, we can expose both the policy flag and this field in the APIs.

My worry is once we expand the API, it is relatively hard to deprecate a field if we see that this isn't a widely popular feature.

What do you think?

pkg/controllers/provisioning/scheduling/scheduler.go

pkg/controllers/provisioning/scheduling/nodeclaim.go

jonathan-innis · 2025-06-20T22:41:44Z

pkg/controllers/provisioning/scheduling/nodeclaim.go

@@ -419,14 +420,18 @@ func filterInstanceTypesByRequirements(instanceTypes []*cloudprovider.InstanceTy
 		// We don't care about the minimum number of instance types that meet our requirements here, we only care if they meet our requirements.
 		_, err.minValuesIncompatibleErr = remaining.SatisfiesMinValues(requirements)


We should perhaps consider returning a modified requirement set with modified minValues based on the result of the relaxation and the remaining instance types attached to the NodeClaim

I thought about this initially but decided against it because it seemed to me that it'd complicate the outcome without a lot of value add (I could be wrong though). The way we evaluate min values today is we return on the first requirement for which min values is violated even when there are multiple requirements whose min values would be violated for a particular scheduling decision. With relaxation, how do we decide which one to relax in case there is more than 1? It seemed a little odd to relax multiple requirements. Which is why the way I am thinking of this is relaxation isn't restricted to a particular requirements but the overall scheduling decision.

What do you think?

Discussed offline: We decided to update the requirements that we are putting in the NodeClaim so that the minValues stays consistent with the count of values that we are passing through to the NodeClaim

pkg/controllers/provisioning/scheduling/nodeclaim.go

pkg/cloudprovider/types.go

pkg/controllers/provisioning/scheduling/nodeclaim.go

pkg/test/options.go

pkg/operator/options/options.go

jonathan-innis · 2025-07-03T00:46:59Z

pkg/operator/options/options.go

@@ -112,7 +121,8 @@ func (o *Options) AddFlags(fs *FlagSet) {
 	fs.DurationVar(&o.BatchMaxDuration, "batch-max-duration", env.WithDefaultDuration("BATCH_MAX_DURATION", 10*time.Second), "The maximum length of a batch window. The longer this is, the more pods we can consider for provisioning at one time which usually results in fewer but larger nodes.")
 	fs.DurationVar(&o.BatchIdleDuration, "batch-idle-duration", env.WithDefaultDuration("BATCH_IDLE_DURATION", time.Second), "The maximum amount of time with no new pending pods that if exceeded ends the current batching window. If pods arrive faster than this time, the batching window will be extended up to the maxDuration. If they arrive slower, the pods will be batched separately.")
 	fs.StringVar(&o.preferencePolicyRaw, "preference-policy", env.WithDefaultString("PREFERENCE_POLICY", string(PreferencePolicyRespect)), "How the Karpenter scheduler should treat preferences. Preferences include preferredDuringSchedulingIgnoreDuringExecution node and pod affinities/anti-affinities and ScheduleAnyways topologySpreadConstraints. Can be one of 'Ignore' and 'Respect'")
-	fs.StringVar(&o.FeatureGates.inputStr, "feature-gates", env.WithDefaultString("FEATURE_GATES", "NodeRepair=false,ReservedCapacity=false,SpotToSpotConsolidation=false"), "Optional features can be enabled / disabled using feature gates. Current options are: NodeRepair, ReservedCapacity, and SpotToSpotConsolidation")
+	fs.StringVar(&o.relaxationPolicyRaw, "relaxation-policy", env.WithDefaultString("RELAXATION_POLICY", string(RelaxationPolicyDefault)), "Relaxation policy for scheduling. Options include 'Default' for existing behavior or 'RelaxWhenMinValuesUnsatisfiable' where Karpenter relaxes min values when it isn't satisfied.")


nit: I feel like this should just be called the MIN_VALUES_POLICY or something like that -- relaxation policy doesn't feel specific enough to suggest that it's specifically applying to the minValues portion of our API. To me, this is something like MIN_VALUES_POLICY=Strict|BestEffort

jonathan-innis · 2025-07-03T00:58:33Z

pkg/cloudprovider/types.go

 	truncatedInstanceTypes := lo.Slice(its.OrderByPrice(requirements), 0, maxItems)
 	// Only check for a validity of NodeClaim if its requirement has minValues in it.
 	if requirements.HasMinValues() {
-		if _, err := truncatedInstanceTypes.SatisfiesMinValues(requirements); err != nil {
+		// If minValues is NOT met for any of the requirement across InstanceTypes, then still allow it if relaxation is enabled.
+		if _, err := truncatedInstanceTypes.SatisfiesMinValues(requirements); err != nil && options.FromContext(ctx).RelaxationPolicy != options.RelaxationPolicyRelaxMinValuesWhenUnsatisfiable {


nit: I'd probably go the other way with also checking that this value is set to something like "Strict" since that's more intentionally checking what you want here (e.g. if we require that minValues is set and we don't satisfy it)

It's also not a huge deal here though because we only have two options to select from

jonathan-innis · 2025-07-03T00:59:21Z

pkg/metrics/metrics.go

@@ -44,6 +44,19 @@ var (
 			CapacityTypeLabel,
 		},
 	)
+	NodeClaimsCreatedWithRelaxedPreferencesTotal = opmetrics.NewPrometheusCounter(


I still don't know if I see a point for this metric to exist -- worst-case it seems like you could just add a label to the existing NodeClaimsCreated metric, but I'm not sure I even think that is too valuable here to have if you have an annotation on the NodeClaim

jonathan-innis · 2025-07-03T01:00:27Z

pkg/controllers/provisioning/scheduling/scheduler.go

@@ -549,7 +550,7 @@ func (s *Scheduler) addToNewNodeClaim(ctx context.Context, pod *corev1.Pod) erro
 			}
 		}
 		nodeClaim := NewNodeClaim(s.nodeClaimTemplates[i], s.topology, s.daemonOverhead[s.nodeClaimTemplates[i]], s.daemonHostPortUsage[s.nodeClaimTemplates[i]], its, s.reservationManager, s.reservedOfferingMode)
-		r, its, ofs, err := nodeClaim.CanAdd(ctx, pod, s.cachedPodData[pod.UID])
+		r, its, ofs, minValuesRelaxed, err := nodeClaim.CanAdd(ctx, pod, s.cachedPodData[pod.UID], karpopts.FromContext(ctx).RelaxationPolicy == karpopts.RelaxationPolicyRelaxMinValuesWhenUnsatisfiable)


Can you just pull the relaxationPolicy from the scheduler struct directly?

jonathan-innis · 2025-07-03T01:02:28Z

pkg/controllers/provisioning/scheduling/scheduler.go

@@ -132,7 +133,7 @@ func NewScheduler(
 	// Pre-filter instance types eligible for NodePools to reduce work done during scheduling loops for pods
 	templates := lo.FilterMap(nodePools, func(np *v1.NodePool, _ int) (*NodeClaimTemplate, bool) {
 		nct := NewNodeClaimTemplate(np)
-		nct.InstanceTypeOptions, _ = filterInstanceTypesByRequirements(instanceTypes[np.Name], nct.Requirements, corev1.ResourceList{}, corev1.ResourceList{}, corev1.ResourceList{})
+		nct.InstanceTypeOptions, _, _ = filterInstanceTypesByRequirements(instanceTypes[np.Name], nct.Requirements, corev1.ResourceList{}, corev1.ResourceList{}, corev1.ResourceList{}, karpopts.FromContext(ctx).RelaxationPolicy == karpopts.RelaxationPolicyRelaxMinValuesWhenUnsatisfiable)


nit: Same comment here -- can you assign this as a member value on the scheduler and then just pull it from that member value? In general, I'm trying to keep option passthroughs as explicit as possible and trying to prevent us from pulling details out from the context arbitrarily in the scheduler code so the dependencies are clear here and the scheduler can be used as a library

jonathan-innis · 2025-07-03T01:04:47Z

pkg/controllers/provisioning/scheduling/nodeclaim.go

@@ -419,14 +420,18 @@ func filterInstanceTypesByRequirements(instanceTypes []*cloudprovider.InstanceTy
 		// We don't care about the minimum number of instance types that meet our requirements here, we only care if they meet our requirements.
 		_, err.minValuesIncompatibleErr = remaining.SatisfiesMinValues(requirements)


Discussed offline: We decided to update the requirements that we are putting in the NodeClaim so that the minValues stays consistent with the count of values that we are passing through to the NodeClaim

jonathan-innis · 2025-07-03T01:07:58Z

pkg/controllers/provisioning/scheduling/nodeclaim.go

 	// Check Taints
 	if err := scheduling.Taints(n.Spec.Taints).ToleratesPod(pod); err != nil {
-		return nil, nil, nil, err
+		return nil, nil, nil, minValuesRelaxed, err


Suggested change

return nil, nil, nil, minValuesRelaxed, err

return nil, nil, nil, false, err

nit: I could see you returning explicit false here but I think it's probably fine either way

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 11, 2025

k8s-ci-robot requested review from engedaam and tallaxes June 11, 2025 23:33

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 11, 2025

feat: support auto relaxing min values

5fd0e4b

saurav-agarwalla force-pushed the main branch from 11d1846 to 5fd0e4b Compare June 12, 2025 01:33

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 12, 2025

add metrics

63e62b1

saurav-agarwalla changed the title ~~[WIP] feat: support auto relaxing min values~~ feat: support auto relaxing min values Jun 12, 2025

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2025

jonathan-innis reviewed Jun 20, 2025

View reviewed changes

jmdeal reviewed Jun 22, 2025

View reviewed changes

saurav-agarwalla added 2 commits June 30, 2025 21:58

address comments from first review

f3596e4

add consolidation test

b6ad4f7

jonathan-innis reviewed Jul 3, 2025

View reviewed changes

		@@ -419,14 +420,18 @@ func filterInstanceTypesByRequirements(instanceTypes []*cloudprovider.InstanceTy
		// We don't care about the minimum number of instance types that meet our requirements here, we only care if they meet our requirements.
		_, err.minValuesIncompatibleErr = remaining.SatisfiesMinValues(requirements)

	return nil, nil, nil, minValuesRelaxed, err
	return nil, nil, nil, false, err

feat: support auto relaxing min values #2299

Are you sure you want to change the base?

feat: support auto relaxing min values #2299

Uh oh!

Conversation

saurav-agarwalla commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jun 11, 2025

Uh oh!

coveralls commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 16010551992

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

saurav-agarwalla commented Jun 11, 2025 •

edited

Loading

coveralls commented Jun 12, 2025 •

edited

Loading