-
Notifications
You must be signed in to change notification settings - Fork 631
Add top-level Labels and Resources Structed fields to HeadGroupSpec
and WorkerGroupSpec
#4106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add top-level Labels and Resources Structed fields to HeadGroupSpec
and WorkerGroupSpec
#4106
Conversation
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Running
which isn't a top-level API object. I'm wondering if there's a preference for changing it to a |
// +optional | ||
Resources corev1.ResourceList `json:"resources,omitempty"` | ||
// Labels specifies the Ray node labels for the head group. | ||
// These labels will also be added to the Pods of this head group and override the `--labels` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should mention that labels are ignored if already specifeid in rayStartParams
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I implemented it, we ignore --labels
in rayStartParams
if they exist and instead override it with the values set in the group Labels
field. Should it actually be the opposite?
My thinking was that since the top-level Labels
and Resources
fields are the most explicit, they should take precedence.
Signed-off-by: Ryan O'Leary <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to avoid handling the logic of overriding or merging user configurations. It’s hard to ensure correct behavior and makes Ray Autoscaler more complex. My suggestions:
Add validations in ValidateRayClusterSpec
:
- Resources
- If users specify both (1)
num-cpus
/num-gpus
/memory
/resources
inrayStartParams
and (2){Head|Worker}GroupSpec.Resources
, we should fail validation and avoid reconciling anything. Users should only use (1) or (2).
- If users specify both (1)
- Labels
- If users specify
labels
inrayStartParams
, we should fail the validation because we plan not to handle the string parsing in Ray Autoscaler as @edoakes said. Only{Head|Worker}GroupSpec.Labels
is allowed.
- If users specify
cc @Future-Outlier @rueian Could one of you open an issue to track updating the compatible Ray versions (because of Ray Autoscaler)? And @rueian, could you work on adding support in Ray Autoscaler for Resources / Labels?
sort.Strings(keys) | ||
|
||
for _, k := range keys { | ||
labels = append(labels, fmt.Sprintf("%s=%s", k, groupLabels[k])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to validate that there is no ,
in the k and groupLabels[k]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened here, thank you!
#4113
Signed-off-by: Ryan O'Leary <[email protected]>
To summarize my points in today's community sync,
My proposals for
|
Based on my memory, we should also remove labels from the env variables in Ray if we want to go with proposal 2. |
We should avoid Ray reading env vars to set Ray labels no matter which proposals. |
updateRayStartParamsLabels(headSpec.RayStartParams, headSpec.Labels) | ||
|
||
// Merge K8s labels from the Pod template and the top-level `Labels` field. | ||
mergedLabels := mergeLabels(headSpec.Template.ObjectMeta.Labels, headSpec.Labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to merge labels from pod template metadata with the new top-level field? This will also complicate the autoscaler, making it need to read pod template metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I'm understanding correctly, the merged labels here are not read by autoscaler. These are labels to write back as Pod labels to provide better visibility of Raylet labels as Pod labels. Autoscaler should still only look at labels
field or labels in rayStartParams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks. I read it wrongly.
Would it be so bad for Ray autoscaler to check labels in both places (API field and rayStartParams) for some period of time? I think even if we agree that |
No strong opinions if @rueian is willing to maintain the complexity in Ray Autoscaler. I propose removing it for now because I think most users don't use label-based scheduling, and even fewer users use label-based scheduling with the Autoscaler at this moment. It's a good timing to get rid of some potential tech debt, and we decide not to support Have we announced anything about label-based scheduling publicly? |
I think leaving the old parsing code untouched in the Ray autoscaler is okay, or just put a TODO for removing it someday :) What we really need to do is reject any label set from other than direct value in the new top-level labels field. Setting labels via env variables, variable expansions, and rayStartParams["labels"] should all be rejected by KubeRay. This way, we can make sure the Ray autoscaler can get the labels directly from the new top-level labels field. |
Proposal 2 sounds good to me too. This PR should contain all the required changes currently and is good to review, I'll update ray-project/ray#57260 to include TODO comments to remove the string parsing logic. I'll also update the label setting logic so that we don't consider environment variables. |
Where are labels currently being set in env vars in KubeRay? We set some default labels in Ray core based on env vars (i.e. accelerator-type for TPU is set using an env var that's set automatically by GKE), but I can remove the ones that we no longer plan to set using KubeRay (i.e. from this closed PR: #3699). Is there anywhere else that env vars are being considered that we need to remove? |
Why are these changes needed?
This PR lifts both resources and labels into explicit structured fields for the
HeadGroupSpec
andWorkerGroupSpec
. When these optional fields are specified, they override their respective values in therayStartCommand
for Pods created by KubeRay for that group. Additionally, labels specified at the top-levelLabels
field are merged with the K8s labels on the Pod for observability.The discussion and rationale for this change is discussed more in #3699. The labels part of this change will help enable the autoscaling use case with
label_selectors
in Ray core.Related issue number
Contributes to ray-project/ray#51564
Checks