-
Notifications
You must be signed in to change notification settings - Fork 333
Snellius mag #867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Snellius mag #867
Conversation
added ng-core config specific for pipeline 'mag'
| time = params.max_time | ||
| // job names need to be unique a | ||
| jobName = { | ||
| //n = "${task.name}" -> n.replaceAll('[\\\s:()]', '_') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| //n = "${task.name}" -> n.replaceAll('[\\\s:()]', '_') |
Seems like trial and error? (I'm not sure why you can't just do "${task.name}".replaceAll('[^a-zA-Z0-9]', '_').
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if that works. The above was suggested by the snellius admins
| #SBATCH --error=/projects/0/<your project space>/jobs/error_%j.out | ||
| #-----------------------------Required resources----------------------- | ||
| # this defines where the nextflow process runs (as separate job) | ||
| # 16 cores is the minimum partition with 28 GB memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
28 Gbyte for the nextflow monitoring process is really a lot. I haven't run mag but would be surprised if anywhere near that is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, its not needed perse, but this is the smallest job one can specify on the cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I read the comment as the smallest that gets 28 Gbyte of RAM. Maybe clarify that's the minimum job size?
But that is still confusing since there are smaller requests used, do they still work and are silently scaled up? Since the expected task attempt handling of throwing more resources won't work as expected if they still get the same, maybe they should be adjusted then?
| errorStrategy = { task.exitStatus in [1, 143, 137, 104, 134, 139, 140] ? 'retry' : 'finish' } | ||
| cpus = { 16 * task.attempt } | ||
| memory = { 28.GB * task.attempt } | ||
| time = 5.d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove time when it's not changing the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. In between I've been pointed at the scheduler strategy, so maybe I should make the default short. The mag pipeline typically runs longer per task (and 5 days is sometimes too short, so restarting the pipeline is needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, setting anything other than the maximum allowed doesn't make sense - the occasional earlier scheduling by backfill doesn't make up for what's wasted by jobs running fine but with too short time.
| withName: MAG_DEPTHS { | ||
| cpus = { 16 * task.attempt } | ||
| memory = { 28.GB * task.attempt } | ||
| time = { 24.h * (2 ** (task.attempt - 1)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I naturally don't know anything about your system, but the two main reasons to play with time is either fit in what's allowed for jobs to a specific partition or possibly to get squeezed in by the backfill (and in some very specific cases not being stopped because the job as submitted would be more than the total allocation).
Are you sure these give any benefit and just not make things needlessly complicated (with a not-insignificant risk of well functioning jobs timing out when they wouldn't need to)?
|
Using https://www.nextflow.io/docs/latest/reference/process.html#resourcelimits is highly recommended. |
|
@nf-core-bot fix linting |
|
pre-commit linnitng error: |
|
And tests aren't working because you haven't got all files for both the global and pipeline specific configs:
The specific error is: |
name: New Config for Snellius HPC (Dutch National Supercomputer faclility
about: A new cluster config for both Snellius in general and one config for the 'mag' pipeline
Please follow these steps before submitting your PR:
[WIP]in its titlemasterbranchSteps for adding a new config profile:
conf/directorydocs/directorynfcore_custom.configfile in the top-level directoryREADME.mdfile in the top-level directoryprofile:scope in.github/workflows/main.yml.github/CODEOWNERS(**/<custom-profile>** @<github-username>)