Utilities and APIs for interfacing with the Slurm workload manager.
slurmutils is a collection of various utilities that make it easier for you and your friends to interface with the Slurm workload manager, especially if you are orchestrating deployments of new and current Slurm clusters. Gone are the days of seething over incomplete Jinja2 templates. Current utilities shipped in the slurmutils package include:
calculate_rs
: A function for calculating the ranges and strides of an iterable with unique elements. This function can be used to help convert arrays of node hostnames, device file ids, etc into a Slurm hostname specification.acctgatherconfig
: An editor for acct_gather.conf configuration files.cgroupconfig
: An editor for cgroup.conf configuration files.gresconfig
: An editor for gres.conf configuration files.ociconfig
: An editor for oci.conf configuration files.slurmconfig
: An editor for slurm.conf configuration files.slurmdbdconfig
: An editor for slurmdbd.conf configuration files.
For more information on how to use or contribute to slurmutils, check out the Getting Started and Development sections below π
$ python3 -m pip install slurmutils
We use the Poetry packaging and dependency manager to manage this project. It must be installed on your system if installing slurmutils from source.
$ git clone https://github.com/canonical/slurmutils.git
$ cd slurmutils
$ poetry install
The top-level provides access to some utilities that streamline common Slurm-related operations such as calculating the ranges and strides for a Slurm hostname specification or editing configuration files in-place. Here's some example operations you can perform with these utilities:
from os.path import commonprefix
from slurmutils import calculate_rs
nodes = ["juju-abc654-1", "juju-abc654-2", "juju-abc654-4"]
prefix = commonprefix(nodes)
nums = [int(n.partition(prefix)[2]) for n in nodes]
slurm_host_spec = prefix + calculate_rs(nums) # "juju-abc654-[1-2,4]"
from pathlib import Path
from slurmutils import calculate_rs
device_files = [file for file in Path("/dev").iterdir() if "nvidia" in file.name]
prefix = "/dev/nvidia"
nums = [int(n.partition(prefix)[2]) for n in device_files]
file_spec = prefix + calculate_rs(nums) # "/dev/nvidia[0-4]"
from slurmutils import acctgatherconfig
with acctgatherconfig.edit("/etc/slurm/acct_gather.conf") as config:
config.profile_influxdb_database = "test_acct_gather_db"
config.profile_influxdb_default = ["none"]
config.profile_influxdb_host = "testhostname1"
config.profile_influxdb_pass = "testpassword1"
config.profile_influxdb_rt_policy = "testpolicy1"
config.profile_influxdb_user = "testuser1"
config.profile_influxdb_timeout = 20
from slurmutils import cgroupconfig
with cgroupconfig.edit("/etc/slurm/cgroup.conf") as config:
config.constrain_cores = True
config.constrain_devices = True
config.constrain_ram_space = True
config.constrain_swap_space = True
from slurmutils import Gres, GresList, gresconfig
with gresconfig.edit("/etc/slurm/gres.conf") as config:
gres1 = Gres(
name="gpu",
type="epyc",
file="/dev/amd4",
cores=[0, 1],
)
gres2 = Gres(
name="gpu",
nodename="juju-abc654-[1-20]",
type="epyc",
file="/dev/amd[0-3]",
count="12G",
)
config.auto_detect = "rsmi"
config.gres["gpu"] = GresList(gres1, gres2)
from slurmutils import ociconfig
with ociconfig.edit("/etc/slurm/oci.conf") as config:
config.ignore_file_config_json = False
config.env_exclude = "^(SLURM_CONF|SLURM_CONF_SERVER|SLURM_JWT)="
config.create_env_file = "newline"
config.std_io_debug = "debug"
config.syslog_debug = "debug"
from slurmutils import slurmconfig
with slurmconfig.edit("/etc/slurm/slurm.conf") as config:
del config.inactive_limit
config.max_job_count = 20000
config.proctrack_type = "proctrack/linuxproc"
from slurmutils import Node, slurmconfig
with slurmconfig.edit("/etc/slurm/slurm.conf") as config:
node = Node(
nodename="batch-[0-25]",
nodeaddr="12.34.56.78",
cpus=1,
realmemory=1000,
tmpdisk=10000,
)
config.nodes[node.node_name] = node
from slurmutils import slurmdbdconfig
with slurmdbdconfig.edit("/etc/slurm/slurmdbd.conf") as config:
config.archive_usage = True
config.log_file = "/var/spool/slurmdbd.log"
config.debug_flags = ["db_event", "db_job", "db_usage"]
del config.auth_alt_types
del config.auth_alt_parameters
If you want to learn more about all the things you can do with slurmutils, here are some further resources for you to explore:
This project uses tox as its command runner, which provides some useful commands that will help you while hacking on slurmutils:
tox run -e fmt # Apply formatting standards to code.
tox run -e lint # Check code against coding style standards.
tox run -e unit # Run unit tests.
If you're interested in contributing your work to slurmutils, take a look at our contributing guidelines for further details.
slurmutils is a project of the Ubuntu High-Performance Computing community. Interested in contributing bug fixes, new editors, documentation, or feedback? Want to join the Ubuntu HPC community? Youβve come to the right place π€©
Hereβs some links to help you get started with joining the community:
- Ubuntu Code of Conduct
- Contributing guidelines
- Join the conversation on Matrix
- Get the latest news on Discourse
- Ask and answer questions on GitHub
slurmutils is free software, distributed under the GNU Lesser General Public License, v3.0. See the LGPL-3.0 LICENSE file for further details.