Skip to content

Commit a803dc4

Browse files
committed
Update configuration files for slurm executor plugin profiles
1 parent 1f3f453 commit a803dc4

File tree

6 files changed

+222
-61
lines changed

6 files changed

+222
-61
lines changed

config/slurm/README.md

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,14 @@ time, their compute resources can be updated in this file.
2323
> Note that the current configuration files were adjusted to the
2424
HPC clusters Rackham from UPPMAX and Dardel from PDC/KTH. Details
2525
on how to configure and run GenErode on Dardel are provided below.
26-
The configuration file for Snakemake version 7 was kept for comparison
26+
Memory requirements are specified three times in these configuration
27+
files: 1) under `set-threads` (used by Snakemake to specify threads
28+
in rules), 2) under `set-resources` and therein under `mem_mb`,
29+
specifying the memory in Megabytes (multiplying the number of threads
30+
with the available memory per thread), and 3) under `set-resources`
31+
and therein under `cpus-per-task` (the same number as specified under
32+
`set-threads`, required for correct memory assignment on Dardel). The
33+
configuration file for Snakemake version 7 was kept for comparison,
2734
which was also written for Rackham/UPPMAX.
2835

2936
3) Start GenErode the following:
@@ -53,6 +60,10 @@ incomplete jobs and `-k` to keep going in case a job fails.
5360
module load PDC UPPMAX bioinfo-tools conda singularity tmux
5461
```
5562

63+
> Note that tmux is only available as a module on Dardel
64+
but the equivalent tool screen is pre-installed and does
65+
not need to be loaded.
66+
5667
2) After cloning the repository, change permissions for the
5768
Snakefile:
5869

@@ -73,12 +84,17 @@ to `slurm/config.yaml`. This file specifies compute resources
7384
for each rule or group jobs to be run on Dardel. Any rule or
7485
group job that is not listed under `set-threads` or `set-resources`
7586
uses default resources specified under `default-resources`. If
76-
any rule or group jobs fail due to too little memory or run
87+
any rule or group job fails due to too little memory or run
7788
time, their compute resources can be updated in this file.
7889

79-
> Note that the current version of `config/slurm/profile/config_plugin_dardel.yaml`
80-
is still being tested. Threads are currently specified under
81-
`set-threads` and under `set-resources` as `cpus_per_task`.
90+
> Note that memory requirements are specified three times in
91+
the configuration file: 1) under `set-threads` (used by Snakemake
92+
to specify threads in rules), 2) under `set-resources` and therein
93+
under `mem_mb`, specifying the memory in Megabytes (multiplying
94+
the number of threads with the available memory per thread),
95+
and 3) under `set-resources` and therein under `cpus-per-task`
96+
(the same number as specified under `set-threads`, required for
97+
correct memory assignment on Dardel).
8298

8399
5) Start GenErode the following:
84100

@@ -96,7 +112,7 @@ conda activate generode
96112
- Start the dry run:
97113

98114
```
99-
snakemake --profile slurm -np &> YYMMDD_dry.out
115+
snakemake --profile slurm -n &> YYMMDD_dry.out
100116
```
101117

102118
- Start the main run:

config/slurm/profile/config_plugin_dardel.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,8 +112,10 @@ set-resources:
112112
cpus_per_task: 16
113113
fastqc_historical_raw:
114114
mem_mb: 16000
115+
cpus_per_task: 16
115116
fastqc_modern_raw:
116117
mem_mb: 16000
118+
cpus_per_task: 16
117119
fastp_historical:
118120
runtime: 600
119121
mem_mb: 32000
@@ -225,9 +227,7 @@ set-resources:
225227
cpus_per_task: 32
226228
sort_vcfs:
227229
runtime: 1440
228-
sort_vcfs:
229230
mem_mb: 16000
230-
sort_vcfs:
231231
cpus_per_task: 16
232232
sorted_bcf2vcf:
233233
runtime: 300

config/slurm/profile/config_plugin_rackham.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,8 +106,10 @@ set-resources:
106106
cpus_per_task: 2
107107
fastqc_historical_raw:
108108
mem_mb: 12800
109+
cpus_per_task: 2
109110
fastqc_modern_raw:
110111
mem_mb: 12800
112+
cpus_per_task: 2
111113
fastp_historical:
112114
runtime: 600
113115
mem_mb: 32000
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# GenErode execution on SLURM clusters
2+
3+
With the switch to Snakemake version 8, GenErode can be run
4+
the following on SLURM clusters:
5+
6+
1) Create the GenErode conda environment or update an earlier
7+
version. The latest conda environment contains the Snakemake
8+
executor plugin for slurm:
9+
10+
```
11+
conda create -f environment.yaml -n generode
12+
```
13+
14+
2) Copy the example configuration file `slurm/profile/config_plugin_dardel.yaml`
15+
to `slurm/config.yaml`. This file specifies compute resources
16+
for each rule or group jobs. Any rule or group job that is
17+
not listed under `set-threads` or `set-resources` uses
18+
default resources specified under `default-resources`. If
19+
any rule or group jobs fail due to too little memory or run
20+
time, their compute resources can be updated in this file.
21+
22+
> Note that the current configuration file was adjusted to the
23+
HPC cluster Dardel from PDC/KTH. Details on how to configure and
24+
run GenErode on Dardel are provided below. Memory requirements are
25+
specified three times in the configuration file: 1) under
26+
`set-threads` (used by Snakemake to specify threads in rules), 2)
27+
under `set-resources` and therein under `mem_mb`, specifying the
28+
memory in Megabytes (multiplying the number of threads with the
29+
available memory per thread), and 3) under `set-resources` and
30+
therein under `cpus-per-task` (the same number as specified under
31+
`set-threads`, required for correct memory assignment on Dardel).
32+
33+
3) Start GenErode the following:
34+
35+
- Open a tmux or screen session
36+
- Activate the GenErode conda environment
37+
- Start the dry run:
38+
39+
```
40+
snakemake --profile slurm -n &> YYMMDD_dry.out
41+
```
42+
43+
- Start the main run:
44+
45+
```
46+
snakemake --profile slurm &> YYMMDD_main.out
47+
```
48+
49+
> Useful flags for running the pipeline: `--ri` to re-run
50+
incomplete jobs and `-k` to keep going in case a job fails.
51+
52+
## Specific instructions for Dardel
53+
54+
1) Load the following modules on Dardel:
55+
56+
```
57+
module load PDC UPPMAX bioinfo-tools conda singularity tmux
58+
```
59+
60+
> Note that tmux is only available as a module on Dardel
61+
but the equivalent tool screen is pre-installed and does
62+
not need to be loaded.
63+
64+
2) After cloning the repository, change permissions for the
65+
Snakefile:
66+
67+
```
68+
chmod 755 Snakefile
69+
```
70+
71+
3) Create the GenErode conda environment or update an earlier
72+
version. The latest conda environment contains the Snakemake
73+
executor plugin for slurm:
74+
75+
```
76+
conda create -f environment.yaml -n generode
77+
```
78+
79+
4) Copy the configuration file `config/slurm/profile/config_plugin_dardel.yaml`
80+
to `slurm/config.yaml`. This file specifies compute resources
81+
for each rule or group jobs to be run on Dardel. Any rule or
82+
group job that is not listed under `set-threads` or `set-resources`
83+
uses default resources specified under `default-resources`. If
84+
any rule or group job fails due to too little memory or run
85+
time, their compute resources can be updated in this file.
86+
87+
> Note that memory requirements are specified three times in
88+
the configuration file: 1) under `set-threads` (used by Snakemake
89+
to specify threads in rules), 2) under `set-resources` and therein
90+
under `mem_mb`, specifying the memory in Megabytes (multiplying
91+
the number of threads with the available memory per thread),
92+
and 3) under `set-resources` and therein under `cpus-per-task`
93+
(the same number as specified under `set-threads`, required for
94+
correct memory assignment on Dardel).
95+
96+
5) Start GenErode the following:
97+
98+
- Open a tmux session (alternatively, you can use screen)
99+
100+
- Activate the GenErode conda environment (create or update
101+
from `environment.yaml`), replacing the path to the location
102+
of the conda environment:
103+
104+
```
105+
export CONDA_ENVS_PATH=/cfs/klemming/home/.../
106+
conda activate generode
107+
```
108+
109+
- Start the dry run:
110+
111+
```
112+
snakemake --profile slurm -n &> YYMMDD_dry.out
113+
```
114+
115+
- Start the main run:
116+
117+
```
118+
snakemake --profile slurm &> YYMMDD_main.out
119+
```
120+
121+
> Useful flags for running the pipeline: `--ri` to re-run
122+
incomplete jobs and `-k` to keep going in case a job fails.

utilities/mutational_load_snpeff/slurm/profile/config_plugin.yaml

Lines changed: 0 additions & 53 deletions
This file was deleted.
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Configuration file for slurm plugin (Snakemake >8.0.0) for Dardel cluster at PDC/KTH
2+
# snakemake CLI flags
3+
executor: slurm
4+
jobs: 100
5+
printshellcmds: true
6+
software-deployment-method: apptainer
7+
8+
# slurm resources
9+
## default-resources: applied to all jobs, overruled by resources defined below for jobs
10+
default-resources:
11+
slurm_account: XXX-XX-XXX # update this to your slurm account
12+
slurm_partition: shared # use Dardel’s shared partition
13+
runtime: 120 # default runtime in minutes
14+
mem_mb: 8000
15+
nodes: 1 # one node on Dardel from the shared partition
16+
ntasks: 1 # number of concurrent tasks / ranks
17+
cpus_per_task: 8 # number of hyperthreads per task, corresponds to 1 GB RAM
18+
19+
## map rule names to threads
20+
set-threads:
21+
extract_number_of_samples: 16
22+
find_fixed_homozygote_alt_sites: 32
23+
remove_fixed_homozygote_alt_sites_merged_vcf: 32
24+
find_intron_intergenic_variants: 16
25+
remove_sites_snpEff_vcf: 32
26+
extract_high_impact_snps: 16
27+
extract_moderate_impact_snps: 16
28+
extract_low_impact_snps: 16
29+
extract_synonymous_variant_snps: 16
30+
total_load: 8
31+
realised_load: 8
32+
33+
## set-resources: map rule names to resources in general
34+
set-resources:
35+
extract_number_of_samples:
36+
mem_mb: 16000
37+
runtime: 30
38+
cpus_per_task: 16
39+
find_fixed_homozygote_alt_sites:
40+
mem_mb: 32000
41+
runtime: 300
42+
cpus_per_task: 32
43+
remove_fixed_homozygote_alt_sites_merged_vcf:
44+
mem_mb: 32000
45+
runtime: 300
46+
cpus_per_task: 32
47+
find_intron_intergenic_variants:
48+
mem_mb: 16000
49+
runtime: 300
50+
cpus_per_task: 16
51+
remove_sites_snpEff_vcf:
52+
mem_mb: 32000
53+
runtime: 300
54+
cpus_per_task: 32
55+
extract_high_impact_snps:
56+
mem_mb: 16000
57+
cpus_per_task: 16
58+
extract_moderate_impact_snps:
59+
mem_mb: 16000
60+
cpus_per_task: 16
61+
extract_low_impact_snps:
62+
mem_mb: 16000
63+
cpus_per_task: 16
64+
extract_synonymous_variant_snps:
65+
mem_mb: 16000
66+
cpus_per_task: 16
67+
total_load:
68+
mem_mb: 8000
69+
runtime: 30
70+
cpus_per_task: 8
71+
realised_load:
72+
mem_mb: 8000
73+
runtime: 30
74+
cpus_per_task: 8

0 commit comments

Comments
 (0)