-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ZEPPELIN-6121] Write a Dockerfile for python interpreter image build #4865
base: master
Are you sure you want to change the base?
Conversation
RUN chmod +x ./mvnw | ||
|
||
RUN ./mvnw clean package -am -pl zeppelin-interpreter-shaded,zeppelin-interpreter,python -DskipTests | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if it might be better to ensure that %python
, %python.ipython
, and %python.sql
are all supported by default.
My thinking comes from the fact that these interpreters are listed in the overview section of the Zeppelin Python interpreter documentation, and IPython is also recommended there.
For IPython, it seems that adding the following command here can be useful for installing the necessary packages:
RUN pip install jupyter-client grpcio protobuf~=3.20 ipython ipykernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbonelee
Thank you for your comment. I have applied your suggestion and agree with you. That said, there are additional libraries, like pandas, that wouldn't be installed. In my opinion, we need an alternative solution to inject these libraries without having to rewrite/deploy the Dockerfile. I think it'd be better to address this in a separate task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your point.
I believe that handling additional packages, like pandas, would require a more flexible approach. Such as using a conda-like package management system. It might be better to address this in a separate PR later.
What is this PR for?
This PR adds a Dockerfile to build a Python interpreter container. I think it could be helpful to use distributed computing resources.
What type of PR is it?
Improvement
Todos
What is the Jira issue?
How should this be tested?
Screenshots (if appropriate)
Questions: