Skip to content

Conversation

fresende
Copy link

Pull Request Template for Kubeflow Manifests

✏️ Summary of Changes

Add Support for notebooks/spark operator to manifests

This is a work in progress to automate the installation of Spark operator integration with notebooks. We are currently having issues when starting the kernel, where it needs to communicate back to the enterprise gateway, but it silently fails. I believe it's related to isito and would appreciate some help.

Connection Flow

Kernel starts and encrypts its connection details.
Kernel sends those details back to Enterprise Gateway.
Enterprise Gateway decrypts and reads the info.
Gateway passes the connection info to the kernel’s proxy.
Gateway uses that info to connect to the kernel’s ports (shell, iopub, stdin, heartbeat, control).

✅ Contributor Checklist

  • I have tested these changes with kustomize. See Installation Prerequisites.
  • All commits are signed-off to satisfy the DCO check.
  • I have considered adding my company to the adopters page to support Kubeflow and help the community, since I expect help from the community for my issue (see 1. and 2.).

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign juliusvonkohout for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fresende fresende changed the title [WIP] Add Support for notebooks/spark operator to mainfests [WIP] Add Support for notebooks/spark operator to manifests Aug 21, 2025
@juliusvonkohout
Copy link
Member

juliusvonkohout commented Aug 21, 2025

Why do you not use the modern standard spark-connect with interactive session support instead of the enterprise gateway?

@tarekabouzeid
Copy link
Member

Why do you not use the modern standard spark-connect with interactive session support instead of the enterprise gateway?

I think spark connect doesn't cover the full Spark API or multi user isolation needs compared to JEG.

@juliusvonkohout
Copy link
Member

Why do you not use the modern standard spark-connect with interactive session support instead of the enterprise gateway?

I think spark connect doesn't cover the full Spark API or multi user isolation needs compared to JEG.

We isolate per namespace, so why is spark-connect not multi-tenant? CC @vikas-saxena02

@tarekabouzeid
Copy link
Member

Why do you not use the modern standard spark-connect with interactive session support instead of the enterprise gateway?

I think spark connect doesn't cover the full Spark API or multi user isolation needs compared to JEG.

We isolate per namespace, so why is spark-connect not multi-tenant? CC @vikas-saxena02

If spark connect is installed per namespace, then yes. But i am not 100% sure if different notebook kernels can then have different spark drivers.

That will be nice to investigate it further. Maybe @fresende have already looked into.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Aug 22, 2025

The goal is to have spark separated per namespace. so spark-cluster and spark-connect is deployed per namespace and only the Jupyterlabs in the namespace can access it.

@vikas-saxena02
Copy link

vikas-saxena02 commented Aug 22, 2025 via email

@tarekabouzeid
Copy link
Member

I have experimented with deploying spark-connect deployed separately as a service.. I could do it on a namespace level. But I advent reis it with the new spark-connect crd which was added recently. Thanks and regards, Vikas Saxena.

On Fri, 22 Aug 2025, 9:55 pm Julius von Kohout, @.> wrote: juliusvonkohout left a comment (kubeflow/manifests#3223) <#3223 (comment)> The goal is to have spark separated per namespace. so spark-cluster and spark-connect is deployed per namespace and only the Jupyterlabs in the namespace can access it. — Reply to this email directly, view it on GitHub <#3223 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVSEDXRQCGS3756IR6IFZQD3O4AL3AVCNFSM6AAAAACENQPRMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMJUGEYDQMZQGI . You are receiving this because you were mentioned.Message ID: @.>

Do you mean for each notebook in that namespace will have its own spark connect and spark application?

@lresende
Copy link
Member

The classic Jupyter + IPython kernel approach with PySpark would allow each user to run their own Spark driver instance, giving them full control over configuration, context, and resource usage without interference from others. It’s a mature, well-proven model that supports the entire PySpark API surface, including advanced features and low-level tuning not yet available through Spark Connect. This setup ensures predictable behavior, easier debugging, and maximum compatibility with existing Spark workflows and Jupyter integrations.

SparkConnect, on the other hand (which I believe came to replace Apache Livy) has its merit and its very good for providing a shared Spark as a service.

We can definitely continue investigating integration with Spark Connect further, but I believe it should happen in parallel as they will probably be used for different user cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This files comes from https://github.com/fresende/kubeflow-manifests/blob/install-eg/scripts/synchronize-spark-operator-manifests.sh and must not be modified. Please create a customize overlay. We also need proper tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you need to sign your commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants