-
Notifications
You must be signed in to change notification settings - Fork 297
feat: support auto relaxing min values #2299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: saurav-agarwalla The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Pull Request Test Coverage Report for Build 16010551992Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
pkg/metrics/metrics.go
Outdated
@@ -44,6 +44,18 @@ var ( | |||
CapacityTypeLabel, | |||
}, | |||
) | |||
NodeClaimsCreatedWithMinValuesAutoRelaxedTotal = opmetrics.NewPrometheusCounter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to make this metric more generic? Like something about preference relaxation? This is super super specific to this change and I don't want to explode the number of metrics that we have as we start adding more config and more features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this makes even more sense with the switch to a policy type instead of the feature flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost don't think that you need the metric at all -- given this setting is so specific, what's the value of surfacing this metric? If you have the annotation, someone can use something like kube-state to capture the number of nodes with relaxed preferences at any given point in time
@@ -37,7 +37,6 @@ type NodeClaimSpec struct { | |||
// Requirements are layered with GetLabels and applied to every node. | |||
// +kubebuilder:validation:XValidation:message="requirements with operator 'In' must have a value defined",rule="self.all(x, x.operator == 'In' ? x.values.size() != 0 : true)" | |||
// +kubebuilder:validation:XValidation:message="requirements operator 'Gt' or 'Lt' must have a single positive integer value",rule="self.all(x, (x.operator == 'Gt' || x.operator == 'Lt') ? (x.values.size() == 1 && int(x.values[0]) >= 0) : true)" | |||
// +kubebuilder:validation:XValidation:message="requirements with 'minValues' must have at least that many values specified in the 'values' field",rule="self.all(x, (x.operator == 'In' && has(x.minValues)) ? x.values.size() >= x.minValues : true)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that we want to drop this requirement. It feels like if we are relaxing the minValues, we should also change the minValues requirement that we have associated with that NodeClaim -- that way we still maintain the requirement being passed onto the NodeClaim AND we don't have to remove the validation that we added here in CEL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my response to relaxing the requirements to another comment. We can keep this validation if we do that but I tried multiple ways to see if this can be validated on the basis of the annotation that I added but K8s doesn't seem to allow that.
Depending on what we decide for relaxing the requirements, I can update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing offline, we decided that we will allow the requirement for minValues to be relaxed so that we can have the NodeClaim have minValues requirements that match the number of values that will be passed through from the scheduler.
This will ensure that NodeClaims continue to be a standalone concept and make sense in isolation (outside the context of the scheduler) -- if we were to go the other way, we would have to write special code on the launch path that would create an exception for values not satisfying minValues if we know that the minValuesPolicy is set in the environment variables
pkg/apis/v1/labels.go
Outdated
@@ -53,6 +53,7 @@ const ( | |||
NodePoolHashAnnotationKey = apis.Group + "/nodepool-hash" | |||
NodePoolHashVersionAnnotationKey = apis.Group + "/nodepool-hash-version" | |||
NodeClaimTerminationTimestampAnnotationKey = apis.Group + "/nodeclaim-termination-timestamp" | |||
NodeClaimMinValuesAutoRelaxedAnnotationKey = apis.Group + "/nodeclaim-min-values-auto-relaxed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why we need this annotation? You can't query on annotations so I'm curious how we expect this being added to the NodeClaim to be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioned this somewhere else but mostly so that it is easy for customers to know when/why a nodeclaim isn't being launched with the min values constraint. Open to other ways of doing this if you have any suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could really go either way on this annotation -- it doesn't seem strictly necessary since we will be updating the requirements directly but I can see some use-cases for it with respect to observability -- in general, I think it's fine to keep it.
@@ -581,6 +589,11 @@ func (s *Scheduler) addToNewNodeClaim(ctx context.Context, pod *corev1.Pod) erro | |||
if i >= idx { | |||
return false | |||
} | |||
|
|||
if minValuesAutoRelaxed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think we need to expose this externally? If we do, does it make sense to put this as a property of the requirement vs. exposing it as a return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to show visibility to the customer in case there were nodeclaims being launched with relaxed min values. Would definitely prefer for this to be in the requirement but trying to keep API changes to a minimum for now. I think once we see more adoption for this feature, we can expose both the policy flag and this field in the APIs.
My worry is once we expand the API, it is relatively hard to deprecate a field if we see that this isn't a widely popular feature.
What do you think?
@@ -419,14 +420,18 @@ func filterInstanceTypesByRequirements(instanceTypes []*cloudprovider.InstanceTy | |||
// We don't care about the minimum number of instance types that meet our requirements here, we only care if they meet our requirements. | |||
_, err.minValuesIncompatibleErr = remaining.SatisfiesMinValues(requirements) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should perhaps consider returning a modified requirement set with modified minValues based on the result of the relaxation and the remaining instance types attached to the NodeClaim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this initially but decided against it because it seemed to me that it'd complicate the outcome without a lot of value add (I could be wrong though). The way we evaluate min values today is we return on the first requirement for which min values is violated even when there are multiple requirements whose min values would be violated for a particular scheduling decision. With relaxation, how do we decide which one to relax in case there is more than 1? It seemed a little odd to relax multiple requirements. Which is why the way I am thinking of this is relaxation isn't restricted to a particular requirements but the overall scheduling decision.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline: We decided to update the requirements that we are putting in the NodeClaim so that the minValues stays consistent with the count of values that we are passing through to the NodeClaim
@@ -112,7 +121,8 @@ func (o *Options) AddFlags(fs *FlagSet) { | |||
fs.DurationVar(&o.BatchMaxDuration, "batch-max-duration", env.WithDefaultDuration("BATCH_MAX_DURATION", 10*time.Second), "The maximum length of a batch window. The longer this is, the more pods we can consider for provisioning at one time which usually results in fewer but larger nodes.") | |||
fs.DurationVar(&o.BatchIdleDuration, "batch-idle-duration", env.WithDefaultDuration("BATCH_IDLE_DURATION", time.Second), "The maximum amount of time with no new pending pods that if exceeded ends the current batching window. If pods arrive faster than this time, the batching window will be extended up to the maxDuration. If they arrive slower, the pods will be batched separately.") | |||
fs.StringVar(&o.preferencePolicyRaw, "preference-policy", env.WithDefaultString("PREFERENCE_POLICY", string(PreferencePolicyRespect)), "How the Karpenter scheduler should treat preferences. Preferences include preferredDuringSchedulingIgnoreDuringExecution node and pod affinities/anti-affinities and ScheduleAnyways topologySpreadConstraints. Can be one of 'Ignore' and 'Respect'") | |||
fs.StringVar(&o.FeatureGates.inputStr, "feature-gates", env.WithDefaultString("FEATURE_GATES", "NodeRepair=false,ReservedCapacity=false,SpotToSpotConsolidation=false"), "Optional features can be enabled / disabled using feature gates. Current options are: NodeRepair, ReservedCapacity, and SpotToSpotConsolidation") | |||
fs.StringVar(&o.relaxationPolicyRaw, "relaxation-policy", env.WithDefaultString("RELAXATION_POLICY", string(RelaxationPolicyDefault)), "Relaxation policy for scheduling. Options include 'Default' for existing behavior or 'RelaxWhenMinValuesUnsatisfiable' where Karpenter relaxes min values when it isn't satisfied.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I feel like this should just be called the MIN_VALUES_POLICY
or something like that -- relaxation policy doesn't feel specific enough to suggest that it's specifically applying to the minValues portion of our API. To me, this is something like MIN_VALUES_POLICY=Strict|BestEffort
truncatedInstanceTypes := lo.Slice(its.OrderByPrice(requirements), 0, maxItems) | ||
// Only check for a validity of NodeClaim if its requirement has minValues in it. | ||
if requirements.HasMinValues() { | ||
if _, err := truncatedInstanceTypes.SatisfiesMinValues(requirements); err != nil { | ||
// If minValues is NOT met for any of the requirement across InstanceTypes, then still allow it if relaxation is enabled. | ||
if _, err := truncatedInstanceTypes.SatisfiesMinValues(requirements); err != nil && options.FromContext(ctx).RelaxationPolicy != options.RelaxationPolicyRelaxMinValuesWhenUnsatisfiable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd probably go the other way with also checking that this value is set to something like "Strict" since that's more intentionally checking what you want here (e.g. if we require that minValues is set and we don't satisfy it)
It's also not a huge deal here though because we only have two options to select from
@@ -44,6 +44,19 @@ var ( | |||
CapacityTypeLabel, | |||
}, | |||
) | |||
NodeClaimsCreatedWithRelaxedPreferencesTotal = opmetrics.NewPrometheusCounter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't know if I see a point for this metric to exist -- worst-case it seems like you could just add a label to the existing NodeClaimsCreated metric, but I'm not sure I even think that is too valuable here to have if you have an annotation on the NodeClaim
@@ -549,7 +550,7 @@ func (s *Scheduler) addToNewNodeClaim(ctx context.Context, pod *corev1.Pod) erro | |||
} | |||
} | |||
nodeClaim := NewNodeClaim(s.nodeClaimTemplates[i], s.topology, s.daemonOverhead[s.nodeClaimTemplates[i]], s.daemonHostPortUsage[s.nodeClaimTemplates[i]], its, s.reservationManager, s.reservedOfferingMode) | |||
r, its, ofs, err := nodeClaim.CanAdd(ctx, pod, s.cachedPodData[pod.UID]) | |||
r, its, ofs, minValuesRelaxed, err := nodeClaim.CanAdd(ctx, pod, s.cachedPodData[pod.UID], karpopts.FromContext(ctx).RelaxationPolicy == karpopts.RelaxationPolicyRelaxMinValuesWhenUnsatisfiable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just pull the relaxationPolicy from the scheduler struct directly?
@@ -132,7 +133,7 @@ func NewScheduler( | |||
// Pre-filter instance types eligible for NodePools to reduce work done during scheduling loops for pods | |||
templates := lo.FilterMap(nodePools, func(np *v1.NodePool, _ int) (*NodeClaimTemplate, bool) { | |||
nct := NewNodeClaimTemplate(np) | |||
nct.InstanceTypeOptions, _ = filterInstanceTypesByRequirements(instanceTypes[np.Name], nct.Requirements, corev1.ResourceList{}, corev1.ResourceList{}, corev1.ResourceList{}) | |||
nct.InstanceTypeOptions, _, _ = filterInstanceTypesByRequirements(instanceTypes[np.Name], nct.Requirements, corev1.ResourceList{}, corev1.ResourceList{}, corev1.ResourceList{}, karpopts.FromContext(ctx).RelaxationPolicy == karpopts.RelaxationPolicyRelaxMinValuesWhenUnsatisfiable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Same comment here -- can you assign this as a member value on the scheduler and then just pull it from that member value? In general, I'm trying to keep option passthroughs as explicit as possible and trying to prevent us from pulling details out from the context arbitrarily in the scheduler code so the dependencies are clear here and the scheduler can be used as a library
@@ -419,14 +420,18 @@ func filterInstanceTypesByRequirements(instanceTypes []*cloudprovider.InstanceTy | |||
// We don't care about the minimum number of instance types that meet our requirements here, we only care if they meet our requirements. | |||
_, err.minValuesIncompatibleErr = remaining.SatisfiesMinValues(requirements) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline: We decided to update the requirements that we are putting in the NodeClaim so that the minValues stays consistent with the count of values that we are passing through to the NodeClaim
// Check Taints | ||
if err := scheduling.Taints(n.Spec.Taints).ToleratesPod(pod); err != nil { | ||
return nil, nil, nil, err | ||
return nil, nil, nil, minValuesRelaxed, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, nil, nil, minValuesRelaxed, err | |
return nil, nil, nil, false, err |
nit: I could see you returning explicit false
here but I think it's probably fine either way
Fixes #2078
Description
Support auto-relaxing min values when using min values would lead to a scheduling error.
TODO:
How was this change tested?
Unit testing, more testing in progress.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.