Skip to content

[Ray-operator] Feature flag login bash #3679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 28, 2025

Conversation

fscnick
Copy link
Contributor

@fscnick fscnick commented May 24, 2025

Why are these changes needed?

Provide a feature flag to enable the login bash command -l if needed. If feature flag is not enabled, it would not apply login bash. This provide the option for the end user to choose one. For example, the uv rely on PATH if enabling the login bash might overwrite PATH potentially.

Related issue number

Part of #3247

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@fscnick
Copy link
Contributor Author

fscnick commented May 24, 2025

Without feature flag:

ray-head:
ray-head

wait-gcs-ready:
wait-gcs-ready

ray-worker:
ray-worker

autoscaler:
autoscaler

rayjob result:
rayjob result

With feature flag:

enable feature flag:
feature gate enable

raycluster with login bash:
login bash

@fscnick fscnick force-pushed the feature-flag-login-bash branch from 0638af7 to 5dda100 Compare May 24, 2025 14:16
@fscnick fscnick marked this pull request as ready for review May 24, 2025 14:55
@fscnick
Copy link
Contributor Author

fscnick commented May 24, 2025

@kevin85421 PTAL

@kevin85421
Copy link
Member

cc @MortalHappiness would you mind reviewing this PR? Thanks!

@@ -33,6 +36,7 @@ func init() {
var defaultFeatureGates = map[featuregate.Feature]featuregate.FeatureSpec{
RayClusterStatusConditions: {Default: true, PreRelease: featuregate.Beta},
RayJobDeletionPolicy: {Default: false, PreRelease: featuregate.Alpha},
RayClusterLoginBash: {Default: false, PreRelease: featuregate.Alpha},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered that in our offline discussion, we dicided to use environment variable instead of feature gates and config API, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kevin85421 , just to make sure. Do we make it as environment variable or feature gate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use an environment variable. Feature gates are meant to provide a way to gradually graduate a new feature from alpha to beta to GA. However, in this case, we are trying to deprecate an existing feature. That is, it is possible for us to remove the feature gate or env var. In this case, if we use feature gate, users need to update their YAML in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to the environment. @MortalHappiness PTAL

@@ -847,6 +848,44 @@ func TestBuildPod_WithCreatedByRayService(t *testing.T) {
utils.EnvVarExists(utils.RAY_TIMEOUT_MS_TASK_WAIT_FOR_DEATH_INFO, pod.Spec.Containers[utils.RayContainerIndex].Env)
}

func TestBuildPod_WithFeatureFlagLoginBash(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a test for the case when the feature flag is not enabled, or add the checks to the TestBuildPod function.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check all /bin/bash occurrences in the codebase. For example, the RedisCleanupJob is not covered in the current PR.

TPURayResourceName = "TPU"

// EnableLoginBashEnvKey is the environment variable to enable login bash for Ray containers.
EnableLoginBashEnvKey = "ENABLE_LOGIN_BASH"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to constant.go and rename it to ENABLE_LOGIN_SHELL.

https://arc.net/l/quote/oyzwyjmd

TPUContainerResourceName = "google.com/tpu"
TPURayResourceName = "TPU"

// EnableLoginBashEnvKey is the environment variable to enable login bash for Ray containers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This environment variable for the KubeRay operator determines whether to enable a login shell by passing the -l option to the container command /bin/bash. The -l flag was added by default before KubeRay v1.4.0, but it is no longer added by default starting with v1.4.0."

@kevin85421
Copy link
Member

still reviewing

@@ -240,6 +243,13 @@ func getEnableProbesInjection() bool {
return true
}

func GetEnableLoginBash() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a utility function instead of calling GetEnableLoginBash multiple times across the codebase?

func GetContainerCommand() {
  if (enableLoginShell) {
    return []string{"/bin/bash", "-lc", "--"},
  }
  return []string{"/bin/bash", "-c", "--"},
}

@fscnick fscnick force-pushed the feature-flag-login-bash branch 2 times, most recently from 47295d7 to ec7da02 Compare May 27, 2025 13:59
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open a follow up PR.

@kevin85421 kevin85421 merged commit 7d0eae4 into ray-project:master May 28, 2025
24 checks passed
pawelpaszki pushed a commit to opendatahub-io/kuberay that referenced this pull request Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants