Skip to content

Conversation

machichima
Copy link
Collaborator

Why are these changes needed?

When we set RayCluster in the rayjob's cluster selector, if the raycluster not found, the rayjob will stuck in initialize without any event message, it is hard to debug without digging into the ray-operator logs.

We send an event when the raycluster set in the clusterSelector is not found, guiding user to create the raycluster manually

Tested locally:

image

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

logger.Info("RayCluster not found", "RayCluster", rayClusterNamespacedName)
if len(rayJobInstance.Spec.ClusterSelector) != 0 {
err := fmt.Errorf("we have choosed the cluster selector mode, failed to find the cluster named %v, err: %w", rayClusterNamespacedName.Name, err)
err := fmt.Errorf("clusterSelector mode enabled, but RayCluster %s/%s not found: %w", rayClusterNamespacedName.Namespace, rayClusterNamespacedName.Name, err)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase the error message a bit

@rueian rueian requested a review from Copilot October 11, 2025 16:12
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the debugging experience for RayJob users by adding an event message when a RayCluster specified in the clusterSelector is not found.

  • Added a new event type RayClusterNotFound to provide better user feedback
  • Enhanced error message formatting to include namespace/name format consistently
  • Emits a warning event when the specified RayCluster cannot be found in clusterSelector mode

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
ray-operator/controllers/ray/utils/constant.go Adds new RayClusterNotFound event type constant
ray-operator/controllers/ray/rayjob_controller.go Improves error message and adds event emission for missing RayCluster

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@rueian rueian merged commit 9e68367 into ray-project:master Oct 11, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants