-
Notifications
You must be signed in to change notification settings - Fork 7
Refactor Jupyterhub configuration files #625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gpu-resource-allocation
Are you sure you want to change the base?
Refactor Jupyterhub configuration files #625
Conversation
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3913/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : refactor-jupyterhub-customization DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/586/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3914/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : refactor-jupyterhub-customization DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/587/NOTEBOOK TEST RESULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not at the project root?
If a pyproject.toml was defined there, we could also do the .bumpversion.toml, pytest.ini and tests/requirements.* file migration under one location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could and that could be a good next step but it seemed out of scope for this PR. Once I start applying these changes for the whole project we need to start updating all the other python code including twitcher hooks for a bunch of different components.
That starts to make this PR really really big and I think those changes should be handled elsewhere.
| `deprecated-components/catalog` is handled (see below). However, it does deprecate some environment variables in | ||
| favour of better configuration solutions: | ||
|
|
||
| - using the `JUPYTERHUB_ENABLE_MULTI_NOTEBOOKS` variable to set the `DockerSpawner.allowed_images` variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the change, but I guess it could have been used for setting other definitions? If such custom overrides were applied/needed, should they be moved to JUPYTERHUB_CONFIG_OVERRIDE? (i.e.: should we recommend using it over JUPYTERHUB_ENABLE_MULTI_NOTEBOOKS?).
Will the effect be equivalent, or other considerations would be necessary (eg: overrides not applying the same way since more changes could happen in between the "allowe_image" step and the final config object)? Some guidances along those lines could be relevant here if there is anything notable to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could have been used for setting other definitions?
It could have but I hope that it wasn't since any changes made there were prone to being overridden later on in the file.
should we recommend using it over JUPYTERHUB_ENABLE_MULTI_NOTEBOOKS?
Yes for sure, by deprecating JUPYTERHUB_ENABLE_MULTI_NOTEBOOKS we should be communicating that but we can be more explicit about that if you think that people are adding additional code to JUPYTERHUB_ENABLE_MULTI_NOTEBOOKS instead of JUPYTERHUB_CONFIG_OVERRIDE.
Will the effect be equivalent
It should be unless they're using that to set a variable that is interpreted differently later on in the code. But I would definitely consider that a hack since the rest of the code could change later on and break any customizations done there. It's much safer to add in any custom overrides at the end of the configuration so that you know for sure that your changes are applied last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I guess let's see if @tlvu was using it in such way.
If not, I am fine with having limited/no details about unintended usage to keep the changes concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tlvu do you have a situation where you're using JUPYTERHUB_ENABLE_MULTI_NOTEBOOKS like this?
birdhouse/components/jupyterhub/jupyterhub_custom/jupyterhub_custom/magpie_authenticator.py
Outdated
Show resolved
Hide resolved
birdhouse/components/jupyterhub/jupyterhub_custom/jupyterhub_custom/magpie_authenticator.py
Outdated
Show resolved
Hide resolved
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3935/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : refactor-jupyterhub-customization DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/603/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3942/Result 🆘 ABORTEDBIRDHOUSE_DEPLOY_BRANCH : refactor-jupyterhub-customization DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca
|
tlvu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that's some massive refactoring ! Took me 2 passes before I grasp the extend of it.
I'll have to deploy this on my test server to see if the shuffling of the existing templating vars do not break our current setup.
One quick observation is that most of the previous configs being set directly in jupyterhub_config.py.template migrate to custom_dockerspawner.py. But some are left behind, so I wonder the ones left behind are because they can not be moved simply because they do not belongs to the docker spawner or because of other reasons?
| @@ -0,0 +1,182 @@ | |||
| import json | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an exact copy of the same file from https://github.com/Ouranosinc/jupyterhub/blob/b3a786a5eb22b09d41f3093b619f02f7b33348a2/jupyterhub_magpie_authenticator/jupyterhub_magpie_authenticator.py or is there any changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same with some additions. Once this PR has been approved we will remove the duplicate code from that repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to have a diff? If that file has changed, the diff should be reviewed as well.
Also, with this PR using the existing jupyterhub image with that old file, the new file here will take precedence?
Asking because I'll need to deploy this PR to test. If there is any special step to do to get this PR working properly, I'd like to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to have a diff?
diff <(curl -s https://raw.githubusercontent.com/bird-house/birdhouse-deploy/445bd8d112b70744249c8bddc8039ee44b050190/birdhouse/components/jupyterhub/jupyterhub_custom/jupyterhub_custom/magpie_authenticator.py) <(curl -s https://raw.githubusercontent.com/Ouranosinc/jupyterhub/refs/heads/master/jupyterhub_magpie_authenticator/jupyterhub_magpie_authenticator.py)Also, with this PR using the existing jupyterhub image with that old file, the new file here will take precedence?
Yes
If there is any special step to do to get this PR working properly, I'd like to know.
There are no special steps
| c4i: false | ||
| type: api | ||
| sync_type: api | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, this rename from .cfg to .yml is just to clarify the file format or there are other reasons for the rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magpie doc used to have permissions.cfg and providers.cfg separately (still allowed and used by birdhouse-deploy), but later introduced config.yml that allows defining everything together since some users/groups permissions are often relevant along the specific services (providers) that get defined.
YAML was used at that point to reflect the actual data structure.
You can actually use many extensions interchangeably. Magpie doesn't care as long as it loads as JSON-like. https://pavics-magpie.readthedocs.io/en/latest/configuration.html#configuration-file-formats
| USER_WORKSPACE_GID: ${USER_WORKSPACE_GID} | ||
| JUPYTERHUB_CRYPT_KEY: ${JUPYTERHUB_CRYPT_KEY} | ||
| JUPYTERHUB_DOCKER_EXTRA_HOSTS: ${JUPYTERHUB_DOCKER_EXTRA_HOSTS:-} | ||
| JUPYTERHUB_AUTHENTICATOR_AUTHORIZATION_URL: ${JUPYTERHUB_AUTHENTICATOR_AUTHORIZATION_URL:-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the new env vars here redefine a default value different than in default.env, is there a reason? And some do not redefine default value here, is there a reason to break the consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example this JUPYTERHUB_AUTHENTICATOR_AUTHORIZATION_URL:${JUPYTERHUB_AUTHENTICATOR_AUTHORIZATION_URL:-} default here is empty string but the default in default.env is
| export JUPYTERHUB_AUTHENTICATOR_AUTHORIZATION_URL='http://twitcher:8000/ows/verify/jupyterhub' |
Then example JUPYTERHUB_ADMIN_GROUP_NAME: ${JUPYTERHUB_ADMIN_GROUP_NAME} do not redefine a default here.
So the consistency question is 2 folds. Why sometime we redefine the default and sometime we don't? And then when we do redefine the default, why it is not the same value as in default.env?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see.
The default is still what is in default.env but docker compose will log a warning if an environment variable is missing in a docker-compose.yml file.
One way to tell docker compose that this variable is allowed to be empty is to use the :- syntax. That way it will not log the warning if this variable happens to be empty.
| # To get the same settings as the (deprecated) JUPYTERHUB_ENABLE_MULTI_NOTEBOOKS settings above you can | ||
| # set the variable as: | ||
| # | ||
| #export JUPYTERHUB_ALLOWED_IMAGES=' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same JUPYTERHUB_ALLOWED_IMAGES var as the one right above. I do not understand what's the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example contains different jupyterlab images. These are the images that are used in the example for JUPYTERHUB_ENABLE_MULTI_NOTEBOOKS further up in the env.local.example file.
Overview
Previously the jupyterhub configuration was defined in
components/jupyterhub/jupyterhub_config.py.templateexcept for the custom authenticator which was included in thepavics/jupyterhubdocker image. This was fine, except that the configuration was split across two projects and as the configuration got more complex, maintaining these was getting more difficult.This moves all configuration code including the authenticator to a python package named
jupyterhub_customlocated atcomponents/jupyterhub/jupyterhub_custom/. It moves all configurations for the authenticator to themagpie_authenticator.pyfile and all the configurations for the spawner to thecustom_spawner.pyfile. All variables that are settable by environment variables are moved to theconstants.pyfile. This makes it much easier to see which variables are configurable using environment variables all in one place.This change introduces unit tests and style policies (enforced by ruff and pre-commit) for this package to encourage better code practices.
This change should be fully backwards compatible with the exception of the change to how settings for
deprecated-components/catalogis handled (see below). However, it does deprecate some environment variables in favour of better configuration solutions:JUPYTERHUB_ENABLE_MULTI_NOTEBOOKSvariable to set theDockerSpawner.allowed_imagesvariable is deprecated.jupyterhub_config.py.templatefile. This makes it too easy to accidentally insert code that breaks the configuration settings later on. We should avoid code insertion like this whenever possible.DockerSpawner.allowed_imageswill be set based on the values ofJUPYTERHUB_IMAGE_SELECTION_NAMESandJUPYTERHUB_DOCKER_NOTEBOOK_IMAGES. If more than one image is specified by those variables then the user will be able to select which one they want. If you want to specify images other than those in the default those can be set using theJUPYTERHUB_ALLOWED_IMAGESwhich is a yaml or JSON mapping of image names to jupyterlab docker image tags.JUPYTERHUB_ADMIN_USERSvariable to set theDockerSpawner.admin_usersvariable is deprecated.based on roles, not by assigning them to the
admin_usersattribute. This makes it possible to easily revoke admin permissions as well and does not require us to restart Jupyterhub to do so.JUPYTERHUB_ADMIN_GROUP_NAMEvariable (default is "jupyterhub-admin"). Add users to this group in Magpie in order to give them admin permissions on Jupyterhub.JUPYTERHUB_CONFIG_OVERRIDEis deprecated.constants.py. For example:notebook_dir == constants.NOTEBOOK_DIR. The uppercase version is preferred.Note: if your deployment is still using the
deprecated-components/catalogcomponent. You may want to manually set the following inJUPYTERHUB_CONFIG_OVERRIDEsince the automatic addition of the theCATALOG_USERNAMEto theblocked_usersset is no longer part of the default settings (since thedeprecated-components/catalogcomponent has been deprecated for a long time):Changes
Non-breaking changes
Breaking changes
deprecated-components/catalogis still used, then it needs to be manually configured (see above)JUPYTERHUB_ADMIN_USERSis not set in the local environment file, the default was to automatically add the magpie admin user as a jupyterhub admin. Now the default is to add no users as an admin automatically and users must be added by being included in theJUPYTERHUB_ADMIN_GROUP_NAMEmagpie group.Related Issue / Discussion
Additional Information
CI Operations
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false