Add top-level Labels and Resources Structed fields to `HeadGroupSpec` and `WorkerGroupSpec` #4106

ryanaoleary · 2025-10-02T05:34:29Z

Why are these changes needed?

This PR lifts both resources and labels into explicit structured fields for the HeadGroupSpec and WorkerGroupSpec. When these optional fields are specified, they override their respective values in the rayStartCommand for Pods created by KubeRay for that group. Additionally, labels specified at the top-level Labels field are merged with the K8s labels on the Pod for observability.

The discussion and rationale for this change is discussed more in #3699. The labels part of this change will help enable the autoscaling use case with label_selectors in Ray core.

Related issue number

Contributes to ray-project/ray#51564

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary · 2025-10-02T05:35:42Z

cc: @Future-Outlier @edoakes @MengjinYan

ryanaoleary · 2025-10-02T05:58:44Z

Running make api-docs fails to add the Resources field to the api.md file. I think this is because I use:

Resources corev1.ResourceList

which isn't a top-level API object. I'm wondering if there's a preference for changing it to a map[string]string and then converting it to a ResourceList internally.

ray-operator/apis/ray/v1/raycluster_types.go

andrewsykim · 2025-10-02T19:07:26Z

ray-operator/apis/ray/v1/raycluster_types.go

+	// +optional
+	Resources corev1.ResourceList `json:"resources,omitempty"`
+	// Labels specifies the Ray node labels for the head group.
+	// These labels will also be added to the Pods of this head group and override the `--labels`


This should mention that labels are ignored if already specifeid in rayStartParams?

The way I implemented it, we ignore --labels in rayStartParams if they exist and instead override it with the values set in the group Labels field. Should it actually be the opposite?

My thinking was that since the top-level Labels and Resources fields are the most explicit, they should take precedence.

ray-operator/controllers/ray/common/pod.go

ray-operator/apis/ray/v1/raycluster_types.go

ray-operator/controllers/ray/common/pod.go

Signed-off-by: Ryan O'Leary <[email protected]>

kevin85421

We should try to avoid handling the logic of overriding or merging user configurations. It’s hard to ensure correct behavior and makes Ray Autoscaler more complex. My suggestions:

Add validations in ValidateRayClusterSpec:

Resources
- If users specify both (1) num-cpus / num-gpus / memory / resources in rayStartParams and (2) {Head|Worker}GroupSpec.Resources, we should fail validation and avoid reconciling anything. Users should only use (1) or (2).
Labels
- If users specify labels in rayStartParams, we should fail the validation because we plan not to handle the string parsing in Ray Autoscaler as @edoakes said. Only {Head|Worker}GroupSpec.Labels is allowed.

cc @Future-Outlier @rueian Could one of you open an issue to track updating the compatible Ray versions (because of Ray Autoscaler)? And @rueian, could you work on adding support in Ray Autoscaler for Resources / Labels?

rueian · 2025-10-04T18:29:03Z

ray-operator/controllers/ray/common/pod.go

+	sort.Strings(keys)
+
+	for _, k := range keys {
+		labels = append(labels, fmt.Sprintf("%s=%s", k, groupLabels[k]))


Do we need to validate that there is no , in the k and groupLabels[k]?

opened here, thank you!
#4113

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary · 2025-10-06T12:43:28Z

rsions (because of Ray Autoscaler)? And @rueian, could you work on adding support in Ray Autoscaler for Resources / Labels?

Added the validation logic in f7f85dd

kevin85421 · 2025-10-08T03:36:07Z

To summarize my points in today's community sync,

Resources: It can improve UX by avoiding the need to specify something like resources: '"{\"Custom1\": 1, \"Custom2\": 5}"' in rayStartParams (example). Therefore, this is safe to add to CRD.
Labels:
- The primary goal of adding Labels to the CRD is to avoid parsing the string rayStartParams["labels"] into Ray labels in the Ray Autoscaler. The Ray Autoscaler already has logic to parse the string. Therefore, this benefit would not be realized if we decide not to remove the string parsing logic from the Ray Autoscaler.
- The other benefit is to verify the label values at the OpenAPI level, but this is not significant enough to justify handling both rayStartParams["labels"] and the new field Labels at the same time.

My proposals for Labels:

Proposal 1: Not adding Labels to the CRD and keeping the Ray Autoscaler parsing the string is fine with me, but Edward didn't like it.
Proposal 2: Add Labels to the CRD and remove the string parsing logic in Ray Autoscaler so that Ray Autoscaler doesn't need to handle both rayStartParams["labels"] and Labels.

cc @rueian @Future-Outlier @ryanaoleary @andrewsykim

Future-Outlier · 2025-10-08T03:41:35Z

To summarize my points in today's community sync,

Resources: It can improve UX by avoiding the need to specify something like resources: '"{\"Custom1\": 1, \"Custom2\": 5}"' in rayStartParams (example). Therefore, this is safe to add to CRD.

Labels:

The primary goal of adding Labels to the CRD is to avoid parsing the string rayStartParams["labels"] into Ray labels in the Ray Autoscaler. The Ray Autoscaler already has logic to parse the string. Therefore, this benefit would not be realized if we decide not to remove the string parsing logic from the Ray Autoscaler.

The other benefit is to verify the label values at the OpenAPI level, but this is not significant enough to justify handling both rayStartParams["labels"] and the new field Labels at the same time.

My proposals for Labels:

Proposal 1: Not adding Labels to the CRD and keeping the Ray Autoscaler parsing the string is fine with me, but Edward didn't like it.

Proposal 2: Add Labels to the CRD and remove the string parsing logic in Ray Autoscaler so that Ray Autoscaler doesn't need to handle both rayStartParams["labels"] and Labels.

cc @rueian @Future-Outlier @ryanaoleary @andrewsykim

Based on my memory, we should also remove labels from the env variables in Ray if we want to go with proposal 2.

cc @rueian @ryanaoleary @andrewsykim @kevin85421

kevin85421 · 2025-10-08T04:40:27Z

Based on my memory, we should also remove labels from the env variables in Ray if we want to go with proposal 2.

We should avoid Ray reading env vars to set Ray labels no matter which proposals.

rueian · 2025-10-08T18:33:18Z

ray-operator/controllers/ray/common/pod.go

+	updateRayStartParamsLabels(headSpec.RayStartParams, headSpec.Labels)
+
+	// Merge K8s labels from the Pod template and the top-level `Labels` field.
+	mergedLabels := mergeLabels(headSpec.Template.ObjectMeta.Labels, headSpec.Labels)


Do we really want to merge labels from pod template metadata with the new top-level field? This will also complicate the autoscaler, making it need to read pod template metadata.

if I'm understanding correctly, the merged labels here are not read by autoscaler. These are labels to write back as Pod labels to provide better visibility of Raylet labels as Pod labels. Autoscaler should still only look at labels field or labels in rayStartParams

Oh thanks. I read it wrongly.

andrewsykim · 2025-10-08T18:41:44Z

Add Labels to the CRD and remove the string parsing logic in Ray Autoscaler so that Ray Autoscaler doesn't need to handle both rayStartParams["labels"] and Labels.

Would it be so bad for Ray autoscaler to check labels in both places (API field and rayStartParams) for some period of time? I think even if we agree that Labels field is the ideal approach, we need to respect backwards compatibility anyways and still check rayStartParams.

kevin85421 · 2025-10-08T21:51:21Z

Would it be so bad for Ray autoscaler to check labels in both places (API field and rayStartParams) for some period of time? I think even if we agree that Labels field is the ideal approach, we need to respect backwards compatibility anyways and still check rayStartParams.

No strong opinions if @rueian is willing to maintain the complexity in Ray Autoscaler. I propose removing it for now because I think most users don't use label-based scheduling, and even fewer users use label-based scheduling with the Autoscaler at this moment. It's a good timing to get rid of some potential tech debt, and we decide not to support rayStartParams["labels"] in KubeRay v1.5.0.

Have we announced anything about label-based scheduling publicly?

rueian · 2025-10-08T22:23:21Z

I think leaving the old parsing code untouched in the Ray autoscaler is okay, or just put a TODO for removing it someday :)

What we really need to do is reject any label set from other than direct value in the new top-level labels field. Setting labels via env variables, variable expansions, and rayStartParams["labels"] should all be rejected by KubeRay. This way, we can make sure the Ray autoscaler can get the labels directly from the new top-level labels field.

ryanaoleary · 2025-10-09T19:52:47Z

To summarize my points in today's community sync,

Resources: It can improve UX by avoiding the need to specify something like resources: '"{\"Custom1\": 1, \"Custom2\": 5}"' in rayStartParams (example). Therefore, this is safe to add to CRD.

Labels:

The primary goal of adding Labels to the CRD is to avoid parsing the string rayStartParams["labels"] into Ray labels in the Ray Autoscaler. The Ray Autoscaler already has logic to parse the string. Therefore, this benefit would not be realized if we decide not to remove the string parsing logic from the Ray Autoscaler.

The other benefit is to verify the label values at the OpenAPI level, but this is not significant enough to justify handling both rayStartParams["labels"] and the new field Labels at the same time.

My proposals for Labels:

Proposal 1: Not adding Labels to the CRD and keeping the Ray Autoscaler parsing the string is fine with me, but Edward didn't like it.

Proposal 2: Add Labels to the CRD and remove the string parsing logic in Ray Autoscaler so that Ray Autoscaler doesn't need to handle both rayStartParams["labels"] and Labels.

cc @rueian @Future-Outlier @ryanaoleary @andrewsykim

Proposal 2 sounds good to me too. This PR should contain all the required changes currently and is good to review, I'll update ray-project/ray#57260 to include TODO comments to remove the string parsing logic. I'll also update the label setting logic so that we don't consider environment variables.

ryanaoleary · 2025-10-09T20:46:55Z

if we want to go with proposal 2

Where are labels currently being set in env vars in KubeRay? We set some default labels in Ray core based on env vars (i.e. accelerator-type for TPU is set using an env var that's set automatically by GKE), but I can remove the ones that we no longer plan to set using KubeRay (i.e. from this closed PR: #3699). Is there anywhere else that env vars are being considered that we need to remove?

ryanaoleary added 2 commits October 2, 2025 04:43

Add top-level Labels and Resources fields

b6f5ec0

Signed-off-by: Ryan O'Leary <[email protected]>

Update API comment

d75cfee

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary requested review from MortalHappiness, andrewsykim, kevin85421 and rueian as code owners October 2, 2025 05:34

ryanaoleary mentioned this pull request Oct 2, 2025

Add default Ray node label info to Ray Pod environment #3699

Closed

4 tasks

andrewsykim reviewed Oct 2, 2025

View reviewed changes

ray-operator/apis/ray/v1/raycluster_types.go Outdated Show resolved Hide resolved

andrewsykim reviewed Oct 2, 2025

View reviewed changes

Future-Outlier reviewed Oct 3, 2025

View reviewed changes

ray-operator/controllers/ray/common/pod.go Outdated Show resolved Hide resolved

ryanaoleary mentioned this pull request Oct 3, 2025

[DOC-127] MVP for OSS Ray labels ray-project/ray#54254

Merged

8 tasks

Fix comments

ee7b3ba

Signed-off-by: Ryan O'Leary <[email protected]>

ryanaoleary requested review from Future-Outlier and andrewsykim October 3, 2025 16:09

kevin85421 reviewed Oct 4, 2025

View reviewed changes

rueian reviewed Oct 4, 2025

View reviewed changes

Future-Outlier mentioned this pull request Oct 6, 2025

Update Ray's version to support label-based scheduling + autoscaler #4113

Open

Add validation logic

f7f85dd

Signed-off-by: Ryan O'Leary <[email protected]>

MengjinYan mentioned this pull request Oct 6, 2025

[Core] Ray Label Selector API Implementation Tracker ray-project/ray#51564

Open

32 tasks

ryanaoleary requested review from kevin85421 and rueian October 6, 2025 22:45

ryanaoleary mentioned this pull request Oct 7, 2025

[Autoscaler][V2] Add top-level Resources and Labels field to KubeRay Autoscaling config ray-project/ray#57260

Open

8 tasks

rueian reviewed Oct 8, 2025

View reviewed changes

Add top-level Labels and Resources Structed fields to HeadGroupSpec and WorkerGroupSpec #4106

Are you sure you want to change the base?

Add top-level Labels and Resources Structed fields to HeadGroupSpec and WorkerGroupSpec #4106

Uh oh!

Conversation

ryanaoleary commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

ryanaoleary commented Oct 2, 2025

Uh oh!

ryanaoleary commented Oct 2, 2025

Uh oh!

Uh oh!

Uh oh!

andrewsykim Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

ryanaoleary Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevin85421 left a comment

Choose a reason for hiding this comment

Uh oh!

rueian Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

Future-Outlier Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

ryanaoleary commented Oct 6, 2025

Uh oh!

kevin85421 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Future-Outlier commented Oct 8, 2025

Uh oh!

kevin85421 commented Oct 8, 2025

Uh oh!

rueian Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewsykim Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rueian Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

andrewsykim commented Oct 8, 2025

Uh oh!

kevin85421 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rueian commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryanaoleary commented Oct 9, 2025

Uh oh!

ryanaoleary commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add top-level Labels and Resources Structed fields to `HeadGroupSpec` and `WorkerGroupSpec` #4106

Add top-level Labels and Resources Structed fields to `HeadGroupSpec` and `WorkerGroupSpec` #4106

ryanaoleary commented Oct 2, 2025 •

edited

Loading

kevin85421 commented Oct 8, 2025 •

edited

Loading

rueian Oct 8, 2025 •

edited

Loading

andrewsykim Oct 8, 2025 •

edited

Loading

kevin85421 commented Oct 8, 2025 •

edited

Loading

rueian commented Oct 8, 2025 •

edited

Loading