This project is being renamed from QHub HPC to Nebari Slurm.
Nebari Slurm is an opinionated open source deployment of jupyterhub based on an HPC jobscheduler. Nebari Slurm is a "distribution" of these packages much like Debian and Ubuntu are distributions of Linux. The high level goal of this distribution is to form a cohesive set of tools that enable:
- environment management via conda and conda-store
- monitoring of compute infrastructure and services
- scalable and efficient compute via jupyterlab and dask
- deployment of jupyterhub on prem without requiring deep devops knowledge of the Slurm/HPC and jupyter ecosystem
- Scalable compute environment based on the Slurm workload manger to take advantage of entire fleet of nodes
- Ansible based provisioning on Ubuntu 18.04 and Ubuntu 20.04 nodes
to deploy one master server and
N
workers. These workers can be pre-existing nodes in your compute environment - Customizable Themes for JupyterHub
- JupyterHub integration allowing users to select the memory, cpus, and environment that jupyterlab instances for users are launched in
- Dask Gateway integration allowing users to selct the memory, cpus, and environment that dask schedule/workers use
- Monitoring of entire cluster via grafana to monitor the nodes, jupyterhub, slurm, and traefik
- Shared directories between all users for collaborative compute
Install ansible dependencies
ansible-galaxy collection install -r requirements.yaml
There are tests for deploying Nebari Slurm on a virtual machine provisioner and in the cloud.
Vagrant is a tool responsible for creating and provisioning vms. It
has convenient integration with ansible which allows for easy
effective control over configuration. Currently the Vagrantfile
only
has support for libvirt
and virtualbox
.
cd tests/ubuntu1804
# cd tests/ubuntu2004
vagrant up --provider=<provider-name>
# vagrant up --provider=libvirt
# vagrant up --provider=virtualbox
Notebook for testing functionality
tests/assets/notebook/test-dask-gateway.ipynb
Current testing environment spins up four nodes:
- all nodes :: node_exporter for node metrics
- master node :: slurm scheduler, munge, mysql, jupyterhub, grafana, prometheus
- worker node :: slurm daemon, munge
Jupyterhub is accessible via <master node ip>:8000
You may need to find a way to port-forward, e.g. over ssh:
vagrant ssh hpc01-test -- -N -L localhost:8000:localhost:8000
then access http://localhost:8000/ on the host.
Grafana is accessible via <master node ip>:3000
Nebari Slurm is BSD3 licensed.
Contributions are welcome!