-
Notifications
You must be signed in to change notification settings - Fork 7
Document how to allow Jupyterlab containers to access GPUs #594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -821,6 +821,45 @@ Usage | |
| The service is available at ``${BIRDHOUSE_PROXY_SCHEME}://${BIRDHOUSE_FQDN_PUBLIC}/jupyter``. Users are able to log in to Jupyterhub using the | ||
| same user name and password as Magpie. They will then be able to launch a personal jupyterlab server. | ||
|
|
||
| GPU support for jupyterlab containers | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| If the host machine has GPUs and you want to make them available to the docker containers running Jupyterlab: | ||
|
|
||
| 1. ensure that the GPU drivers on the host machine are up to date | ||
| 2. install the `NVIDIA container toolkit`_ package on the host machine | ||
| 3. `configure docker`_ to use the NVIDIA as a container run-time | ||
| 4. restart docker for the changes to take effect | ||
| 5. add the following to the ``JUPYTERHUB_CONFIG_OVERRIDE`` variable in your local environment file: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| # enable GPU support | ||
| import docker | ||
|
|
||
| c.DockerSpawner.extra_host_config["device_requests"] = [ | ||
| docker.types.DeviceRequest(count=-1, capabilities=[["gpu"]]) | ||
mishaschwartz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ] | ||
|
Comment on lines
+840
to
+842
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does that mean that every single user notebook automatically gets assigned a GPU? Is there a way to have both GPU/CPU-only simultaneously (adding more I would be interested in that specific multi-config example, and how users interact with it to request (or us limiting them) appropriate resources.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, it means that every container has access to all GPUs on the host. This PR doesn't introduce any solutions for allocating different GPU resources to different users. That is a much more complex thing that I'll have to try to figure out at a later date (because I don't really understand it yet).
I think so, it would require a pretty good understanding of the nvidia toolkit and docker settings. I'm still reading about it and I can continue to update these docs as we figure different possible configurations.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if something like https://jupyterhub-dockerspawner.readthedocs.io/en/latest/api/index.html#dockerspawner.SwarmSpawner.group_overrides could be used to dynamically apply the GPU request on specific users/conditions, therefore allowing having GPU or CPU-only setup. Documentation is very sparse, so definitely very hard to figure out 😅 |
||
|
|
||
|
|
||
| This will allow the docker containers to access all GPUs on the host machine. To limit the number of GPUs you want to make available | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So all Jupyter users will have access to the GPU? And if they happen to all use the GPU at the same time, will they step on each other foot?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup they will definitely step on each others feet. Just the same as if a user hogs any other resource (CPU, memory, etc.) We definitely need a better way to manage resource over-use but the problem isn't specific to GPUs. |
||
| you can change the ``count`` value to a positive integer or you can specify the ``device_ids`` key instead. ``device_ids`` takes a list | ||
|
Comment on lines
+845
to
+846
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Requesting all available GPUs would essentially lock-out any second user trying to use a kernel. The example should probably use
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me clarify... this gives each container access to all GPUs on the host and they share them as a resource. The same way they're sharing access to the CPUs, memory, etc. Note that if user A proceeds to max out some of the GPUs then user B can't use them until they are done but managing that goes beyond the scope of this documentation so far.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unless you have big GPUs like A100 to actually provide virtual-GPU VRAM, it won't take that much to cause all users to crash their respective processes from OutOfMemoryError. I'm nowhere near an expert on the matter, but I know that our clusters leverage some vGPUs to allow some kind of splitting this way. I don't know if that would play nice with multiple dockers trying to access the same GPU. Doesn't it do some kind of lock/reservation when assigned to a particular kernel?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that you need a vGPU setup for docker. Since there is no true VM (no hypervisor) with docker the containers can just access the GPU directly. One thing you could do is use MIG (Multi-Instance GPU) or MPS (Multi-Process Service) to split up resources but not all GPUs support these (none of ours do unfortunately).
As far as I know it manages context-switching the same way a CPU would that was running multiple processes/threads. So everything gets slowed down and/or on-GPU memory gets filled up but it won't necessarily just immediately fail from a user perspective. |
||
| of integers representing the devices (GPUs) that you want to enable. Device IDs for each GPU can be inspected by running the ``nvidia-smi`` | ||
| command on the host machine. | ||
|
|
||
| The `driver capabilities`_ setting indicates that this device request is for GPUs (as opposed to other devices that may be | ||
| available such as TPUs). | ||
|
|
||
| For example, if I only want to make available GPUs with ids 1 and 4 you would set: | ||
|
|
||
| .. code-block:: | ||
|
|
||
| docker.types.DeviceRequest(device_ids=[1, 4], capabilities=[["gpu"]]) | ||
|
|
||
| .. _NVIDIA container toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html | ||
| .. _configure docker: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-docker | ||
| .. _driver capabilities: https://docs.docker.com/reference/compose-file/deploy/#capabilities | ||
|
|
||
| How to Enable the Component | ||
| --------------------------- | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume enabling it like this makes it available to all containers on the server.
It doesn't have to be mentioned here, but just pointing it out FYI, it would be relevant for
weaver-workeras well to run GPU jobs. I'm just not sure if the device syntax is the same in docker-compose since it's been 2-3 years since I've checked this.If it does indeed work like this, maybe a note about all server resources sharing the GPUs could be relevant. They would not (necessarily) be dedicated to jupyter kernels.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Yes I definitely want to figure out how to make this work with weaver as well. Since the weaver worker container is not dynamically created, I think we can just add directly to the weaver-worker definition in the relevant docker-compose-extra.yml file. But that's something I'll have to figure out/work on next.