Skip to content

Conversation

@vdejager
Copy link

@vdejager vdejager commented Mar 18, 2025


name: New Config for Snellius HPC (Dutch National Supercomputer faclility
about: A new cluster config for both Snellius in general and one config for the 'mag' pipeline

Please follow these steps before submitting your PR:

  • If your PR is a work in progress, include [WIP] in its title
  • Your PR targets the master branch
  • You've included links to relevant issues, if any

Steps for adding a new config profile:

  • Add your custom config file to the conf/ directory
  • Add your documentation file to the docs/ directory
  • Add your custom profile to the nfcore_custom.config file in the top-level directory
  • Add your custom profile to the README.md file in the top-level directory
  • Add your profile name to the profile: scope in .github/workflows/main.yml
  • OPTIONAL: Add your custom profile path and GitHub user name to .github/CODEOWNERS (**/<custom-profile>** @<github-username>)

time = params.max_time
// job names need to be unique a
jobName = {
//n = "${task.name}" -> n.replaceAll('[\\\s:()]', '_')
Copy link
Contributor

@pontus pontus Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//n = "${task.name}" -> n.replaceAll('[\\\s:()]', '_')

Seems like trial and error? (I'm not sure why you can't just do "${task.name}".replaceAll('[^a-zA-Z0-9]', '_').

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if that works. The above was suggested by the snellius admins

#SBATCH --error=/projects/0/<your project space>/jobs/error_%j.out
#-----------------------------Required resources-----------------------
# this defines where the nextflow process runs (as separate job)
# 16 cores is the minimum partition with 28 GB memory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

28 Gbyte for the nextflow monitoring process is really a lot. I haven't run mag but would be surprised if anywhere near that is needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, its not needed perse, but this is the smallest job one can specify on the cluster

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I read the comment as the smallest that gets 28 Gbyte of RAM. Maybe clarify that's the minimum job size?

But that is still confusing since there are smaller requests used, do they still work and are silently scaled up? Since the expected task attempt handling of throwing more resources won't work as expected if they still get the same, maybe they should be adjusted then?

errorStrategy = { task.exitStatus in [1, 143, 137, 104, 134, 139, 140] ? 'retry' : 'finish' }
cpus = { 16 * task.attempt }
memory = { 28.GB * task.attempt }
time = 5.d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove time when it's not changing the default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. In between I've been pointed at the scheduler strategy, so maybe I should make the default short. The mag pipeline typically runs longer per task (and 5 days is sometimes too short, so restarting the pipeline is needed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, setting anything other than the maximum allowed doesn't make sense - the occasional earlier scheduling by backfill doesn't make up for what's wasted by jobs running fine but with too short time.

withName: MAG_DEPTHS {
cpus = { 16 * task.attempt }
memory = { 28.GB * task.attempt }
time = { 24.h * (2 ** (task.attempt - 1)) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I naturally don't know anything about your system, but the two main reasons to play with time is either fit in what's allowed for jobs to a specific partition or possibly to get squeezed in by the backfill (and in some very specific cases not being stopped because the job as submitted would be more than the total allocation).

Are you sure these give any benefit and just not make things needlessly complicated (with a not-insignificant risk of well functioning jobs timing out when they wouldn't need to)?

@pontus
Copy link
Contributor

pontus commented Mar 18, 2025

@jfy133
Copy link
Member

jfy133 commented Mar 18, 2025

@nf-core-bot fix linting

@jfy133
Copy link
Member

jfy133 commented Mar 19, 2025

pre-commit linnitng error:

docs/pipeline/mag/snellius.md:
	69: Wrong amount of left-padding spaces(want multiple of 2)
	70: Wrong amount of left-padding spaces(want multiple of 2)
	71: Wrong amount of left-padding spaces(want multiple of 2)
	72: Wrong amount of left-padding spaces(want multiple of 2)
	73: Wrong amount of left-padding spaces(want multiple of 2)
	74: Wrong amount of left-padding spaces(want multiple of 2)
	75: Wrong amount of left-padding spaces(want multiple of 2)
	76: Wrong amount of left-padding spaces(want multiple of 2)
	```

@jfy133
Copy link
Member

jfy133 commented Mar 19, 2025

And tests aren't working because you haven't got all files for both the global and pipeline specific configs:

  1. Global profile: https://github.com/nf-core/configs?tab=readme-ov-file#uploading-to-nf-coreconfigs
  2. Pipeline profile: https://github.com/nf-core/configs?tab=readme-ov-file#adding-a-new-pipeline-specific-config

The specific error is:

Tests don't seem to test these profiles properly. Please check whether you added the profile to the Github Actions testing YAML.

{'snellius'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants