Skip to content

Conversation

CathalOConnorRH
Copy link

@CathalOConnorRH CathalOConnorRH commented Oct 6, 2025

Initial WIP commit for OAuth as part of the Controlled Network Environment
More authentication types to be added.
This PR defines the ControlledNetworkEnvironment configuration type which is used to enable the authentication controller but can also be used to trigger mTLS and secure network policy reconciliation once they are reworked to match this type if agreed by the Ray team.

Why are these changes needed?

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Initial WIP commit for OAuth as part of the Controlled Network Environment
Copy link

@laurafitzgerald laurafitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CathalOConnorRH I did a quick high level review, didn't get in depth into the actual auth logic. Leaving some early review comments but it looks to be in good shape!

// detectAuthenticationMode determines whether the cluster is using OAuth or OIDC
func (r *AuthenticationController) detectAuthenticationMode(ctx context.Context) AuthenticationMode {
// First, check if we're on OpenShift
if !r.options.IsOpenShift {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being done differently here -> https://github.com/ray-project/kuberay/blob/master/ray-operator/controllers/ray/raycluster_controller.go#L87 I haven't evaluated which makes the most sense but at a minimum let's keep the method consistent across all the reconcilers.


if err := r.Get(ctx, client.ObjectKey{Name: routeName, Namespace: cluster.Namespace}, route); err != nil {
if errors.IsNotFound(err) {
logger.Info("Route not found, will be created with OAuth proxy configuration", "route", routeName)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are other spots in the logic where routes are being created and we are handling that here, let's chat about whether it might make sense to align the various sets of logic around routes.

}

// generateSelfSignedCert generates a self-signed certificate for OAuth proxy
func generateSelfSignedCert(cluster *rayv1.RayCluster) ([]byte, []byte, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we depend on cert-manager here given it's a requirement for controlledNetworkEnvironment feature?

For(&rayv1.RayCluster{}).
// Watch authentication resources for informational purposes
Watches(&configv1.Authentication{}, &handler.EnqueueRequestForObject{}).
// Watches(&routev1.Route{}, &handler.EnqueueRequestForObject{}).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason the route watch is commented out?

flag.BoolVar(&enableMetrics, "enable-metrics", false, "Enable the emission of control plane metrics.")
flag.Float64Var(&qps, "qps", float64(configapi.DefaultQPS), "The QPS value for the client communicating with the Kubernetes API server.")
flag.IntVar(&burst, "burst", configapi.DefaultBurst, "The maximum burst for throttling requests from this client to the Kubernetes API server.")
flag.BoolVar(&controlledNetworkEnvironment, "controlled-network-environment", false, "Enable OAuth proxy for all RayClusters on OpenShift (automatically enabled when OpenShift is detected).")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also mTLS and Network Isolation here as well right?

config.QPS = &qps
config.Burst = &burst

if utils.GetClusterType() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this function isn't introduced here but I think the naming could be improved. There's also a check on an environment variable inside the function for IngressOnOpenshift which could have unintended consequences here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants