-
Notifications
You must be signed in to change notification settings - Fork 1k
[WIP] Add Support for notebooks/spark operator to manifests #3223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Fellipe Resende <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Why do you not use the modern standard spark-connect with interactive session support instead of the enterprise gateway? |
I think spark connect doesn't cover the full Spark API or multi user isolation needs compared to JEG. |
We isolate per namespace, so why is spark-connect not multi-tenant? CC @vikas-saxena02 |
If spark connect is installed per namespace, then yes. But i am not 100% sure if different notebook kernels can then have different spark drivers. That will be nice to investigate it further. Maybe @fresende have already looked into. |
The goal is to have spark separated per namespace. so spark-cluster and spark-connect is deployed per namespace and only the Jupyterlabs in the namespace can access it. |
I have experimented with deploying spark-connect deployed separately as a
service.. I could do it on a namespace level. But I advent reis it with the
new spark-connect crd which was added recently.
Thanks and regards,
Vikas Saxena.
…On Fri, 22 Aug 2025, 9:55 pm Julius von Kohout, ***@***.***> wrote:
*juliusvonkohout* left a comment (kubeflow/manifests#3223)
<#3223 (comment)>
The goal is to have spark separated per namespace. so spark-cluster and
spark-connect is deployed per namespace and only the Jupyterlabs in the
namespace can access it.
—
Reply to this email directly, view it on GitHub
<#3223 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVSEDXRQCGS3756IR6IFZQD3O4AL3AVCNFSM6AAAAACENQPRMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMJUGEYDQMZQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Do you mean for each notebook in that namespace will have its own spark connect and spark application? |
The classic Jupyter + IPython kernel approach with PySpark would allow each user to run their own Spark driver instance, giving them full control over configuration, context, and resource usage without interference from others. It’s a mature, well-proven model that supports the entire PySpark API surface, including advanced features and low-level tuning not yet available through Spark Connect. This setup ensures predictable behavior, easier debugging, and maximum compatibility with existing Spark workflows and Jupyter integrations. SparkConnect, on the other hand (which I believe came to replace Apache Livy) has its merit and its very good for providing a shared Spark as a service. We can definitely continue investigating integration with Spark Connect further, but I believe it should happen in parallel as they will probably be used for different user cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This files comes from https://github.com/fresende/kubeflow-manifests/blob/install-eg/scripts/synchronize-spark-operator-manifests.sh and must not be modified. Please create a customize overlay. We also need proper tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you need to sign your commits.
Pull Request Template for Kubeflow Manifests
✏️ Summary of Changes
Add Support for notebooks/spark operator to manifests
This is a work in progress to automate the installation of Spark operator integration with notebooks. We are currently having issues when starting the kernel, where it needs to communicate back to the enterprise gateway, but it silently fails. I believe it's related to isito and would appreciate some help.
Connection Flow
Kernel starts and encrypts its connection details.
Kernel sends those details back to Enterprise Gateway.
Enterprise Gateway decrypts and reads the info.
Gateway passes the connection info to the kernel’s proxy.
Gateway uses that info to connect to the kernel’s ports (shell, iopub, stdin, heartbeat, control).
✅ Contributor Checklist