-
Notifications
You must be signed in to change notification settings - Fork 544
Add default Ray node label info to Ray Pod environment #3699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add default Ray node label info to Ray Pod environment #3699
Conversation
cc: @MengjinYan |
@@ -493,6 +493,9 @@ func BuildPod(ctx context.Context, podTemplateSpec corev1.PodTemplateSpec, rayNo | |||
initLivenessAndReadinessProbe(&pod.Spec.Containers[utils.RayContainerIndex], rayNodeType, creatorCRDType) | |||
} | |||
|
|||
// add downward API environment variables for Ray default node labels | |||
addDefaultRayNodeLabels(&pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we guard this logic with a Ray version check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code doesn't rely on any API change from Ray, it just sets some env vars and the actual node labels get set in Ray core using those vars, but I can add a version guard here (I guess for whatever version ray-project/ray#53360 is included in) if we don't want it setting any unused vars for users on older versions of Ray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From offline discussion with @MengjinYan we were leaning towards not including a version guard, since users are not required to specify the Ray version they're using in the CR spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LTGM @kevin85421 Can take a look from Kuberay's perspective?
pod.Spec.Containers[utils.RayContainerIndex].Env, | ||
// used to set the ray.io/market-type node label | ||
corev1.EnvVar{ | ||
Name: "RAY_NODE_MARKET_TYPE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan for Ray Core to check these env vars? Can you link the PR that includes this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's correct - this is the related PR: ray-project/ray#53360
@kevin85421 Bumping this to see if we can include it in 1.4 |
@ryanaoleary I chatted with @MengjinYan, and my understanding is that this doesn’t need to be included in v1.4.0. Could you sync with @MengjinYan and let me know if I’m mistaken? Thanks! |
Synced offline with @MengjinYan and yeah there's no urgency to include this in v1.4.0, we can wait to include it in the next release. My thought was just that it'd be useful to have this functionality in the soonest stable release for testing, but I can just use the nightly image. |
This makes sense. I’ll review it for now. We’ll make a best-effort attempt to include this PR, but there’s no guarantee. |
) | ||
} | ||
|
||
// getRayMarketTypeFromNodeSelector is a helper function to determine the ray.io/market-type label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is nodeSelctor
enough? There are multiple methods to affect Pod scheduling such as nodeSelector
, taints
/tolerations
, affinity
, priorityClass
, and so on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for market type, nodeSelector
is probably enough for now since it seems to be the standard method to schedule on spot instances: https://cloud.google.com/kubernetes-engine/docs/concepts/spot-vms. Of the others, I think we'd only be concerned about taints
/tolerations
and affinity. For nodeAffinity
, we could also check for something like:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-spot
operator: In
values:
- "true"
and add the label if it exists. For taints
/tolerations
I don't think we need to check for them even if the Pod contains a toleration like:
spec:
tolerations:
- key: "cloud.google.com/gke-spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
since it's not guaranteed that the Pod will schedule to a spot instance unless an affinity
or nodeSelector
also exists (plus it's not guaranteed the node pool will have the matching taint, so the toleration could just be ignored).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I'll go ahead and add a check for nodeAffinity
, does the above make sense to you @MengjinYan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this check in cc581c2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an affinity or toleration doesn't always guarentee that the Pod is running on a spot instance. I think node selector is as close as we get, for other use-cases the user will need to override labels using rayStartParams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess node affinity with requiredDuringSchedulingIgnoredDuringExecution would actually, so nvm :)
|
||
// getRayMarketTypeFromNodeSelector is a helper function to determine the ray.io/market-type label | ||
// based on user-provided Kubernetes nodeSelector values. | ||
func getRayMarketTypeFromNodeSelector(pod *corev1.Pod) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess some schedulers or webhooks may update the node selector after the Pod is created. We should take it into consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that the value of the default labels might change after the ray node started?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might change after the ray node started?
No, I mean it may be changed after KubeRay constructs the Pod spec but before the Pod is created or scheduled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any webhooks that modify the value of cloud.google.com/gke-spot
on a Pod? I'm having difficulty finding a reference. I think we should default to adding Ray node labels based on what the user specifies in their Pod spec.
Name: utils.RayNodeZone, | ||
ValueFrom: &corev1.EnvVarSource{ | ||
FieldRef: &corev1.ObjectFieldSelector{ | ||
FieldPath: fmt.Sprintf("metadata.labels['%s']", utils.K8sTopologyZoneLabel), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should check the existence of the label in the Pod first before referencing from downward API. Should we also check node selector for topology.kubernetes.io/zone=<zone>
and use that value for RAY_NODE_ZONE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me, I added logic to check the Pod Labels, then NodeSelectors, and then fallback to using downward API in 604c162 for both the region and zone.
Signed-off-by: Ryan O'Leary <[email protected]> Add default Ray node label info to Ray Pod environment Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
daebac4
to
604c162
Compare
Add default Ray node label info to Ray Pod environment
Why are these changes needed?
This PR adds market-type information for different cloud providers to the Ray pod environment based on the provided
nodeSelector
value. This PR also adds environment variables to pass region and zone information using downward API (kubernetes/kubernetes#127092). These environment variables will be used in Ray core to set default Ray node labels.I'll add a comment below with my manual test results with propagating
topology.k8s.io/region
andtopology.k8s.io/zone
on a GKE v1.33 alpha cluster.Related issue number
ray-project/ray#51564
Checks