Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script recommendations #2888

Merged
merged 15 commits into from
Dec 18, 2024
Merged

Add script recommendations #2888

merged 15 commits into from
Dec 18, 2024

Conversation

pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Dec 6, 2024

Propose some lines about module scripts.

@netlify /docs/guidelines/components/modules

@pinin4fjords pinin4fjords marked this pull request as draft December 6, 2024 16:49
@pinin4fjords pinin4fjords requested a review from jfy133 December 6, 2024 16:49
Copy link

netlify bot commented Dec 6, 2024

Deploy Preview for nf-core-main-site ready!

Name Link
🔨 Latest commit c438115
🔍 Latest deploy log https://app.netlify.com/sites/nf-core-main-site/deploys/6759db8f6566850008f6b011
😎 Deploy Preview https://deploy-preview-2888--nf-core-main-site.netlify.app/docs/guidelines/components/modules
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Dec 6, 2024

Deploy Preview for nf-core-docs ready!

Name Link
🔨 Latest commit c438115
🔍 Latest deploy log https://app.netlify.com/sites/nf-core-docs/deploys/6759db8fdbef930008982d13
😎 Deploy Preview https://deploy-preview-2888--nf-core-docs.netlify.app/docs/guidelines/components/modules
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@pinin4fjords
Copy link
Member Author

@nf-core-bot fix linting

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, escaping in suggestions is hard... hope you undersatnd it...

I also decided to be more strict, and require a stadnardised tempalte file naming structure (Which would require an update to deseq2/differential, let me know what you think...

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few further tweaks


For example, for R you can refer to the `deseq2/differential` module [here](https://github.com/nf-core/modules/blob/4c2d06a5e79abf08ba7f04c58e39c7dad75f094d/modules/nf-core/deseq2/differential/templates/deseq_de.R#L509-L534).

#### Script template stub
Copy link
Member

@jfy133 jfy133 Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Script template stub
#### Module template stub

@pinin4fjords
Copy link
Member Author

@nf-core-bot fix linting

@jfy133 jfy133 marked this pull request as ready for review December 9, 2024 11:26
@christopher-hakkaart
Copy link
Member

Can we reference the reason why not to use module binaries? Personally I don't really like the interface to template, and I hate to see R scripts written with base R and the horribly clunky code to allow arguments in that way. I would suggest that we encourage the use of a seqera container with an appropriate interface package (e.g. argparse). We could do with more explanation as to how to actualy use template, because I don't think we have that here (and the Seqera docs are almost non-existent on the topic).

A PR is in progress to address the gap in the Nextflow docs. This doesn't mean that nf-core can't have its own standards/recommendations, but it will hopefully add something that can be easily referenced.

@SPPearce
Copy link
Contributor

Can we reference the reason why not to use module binaries? Personally I don't really like the interface to template, and I hate to see R scripts written with base R and the horribly clunky code to allow arguments in that way. I would suggest that we encourage the use of a seqera container with an appropriate interface package (e.g. argparse). We could do with more explanation as to how to actualy use template, because I don't think we have that here (and the Seqera docs are almost non-existent on the topic).

A PR is in progress to address the gap in the Nextflow docs. This doesn't mean that nf-core can't have its own standards/recommendations, but it will hopefully add something that can be easily referenced.

Great, always happy to see more documentation on the Nextflow page itself.

Co-authored-by: Christopher Hakkaart <[email protected]>
@jfy133
Copy link
Member

jfy133 commented Dec 10, 2024

Can we reference the reason why not to use module binaries?

Apparently you can only use binaries in cloud contexts if you use wave (not other containers), which I feel is overly restrictive.

Personally I don't really like the interface to template, and I hate to see R scripts written with base R and the horribly clunky code to allow arguments in that way. I would suggest that we encourage the use of a seqera container with an appropriate interface package (e.g. argparse). We could do with more explanation as to how to actualy use template, because I don't think we have that here (and the Seqera docs are almost non-existent on the topic).

Agree, more Nextflow docs/example on how to use would be nice, and again, would rather not 'force' Seqera Containers.

I much prefer the use of in-line scripts where possible, what is the justification behind saying it can only be 20 lines?

It's a rule of thumb suggested by @pinin4fjords (I've changed to 'approx' though)

Oh, and some examples pointing to other modules rather than deseq2/differential repeatedly would be good.

Suggest some then! 😝

@christopher-hakkaart
Copy link
Member

I suggest removing the repeated "approximately 20 lines" and instead describing this once: something like, "When a script extends beyond a readable length (approximately 20 lines)..."

Then just refer to a readable length for the rest of the text.

Readable length could be "Manageable size", "too verbose", "excessive length", or something like that

@jfy133
Copy link
Member

jfy133 commented Dec 10, 2024

I suggest removing the repeated "approximately 20 lines" and instead describing this once: something like, "When a script extends beyond a readable length (approximately 20 lines)..."

Then just refer to a readable length for the rest of the text.

Readable length could be "Manageable size", "too verbose", "excessive length", or something like that

Done!

@christopher-hakkaart
Copy link
Member

Much better - thanks!

@pinin4fjords
Copy link
Member Author

@SPPearce to be clear, I want module binaries to be the way to do this, with argparse baked into the container to avoid all that junk I have to put in templates, I'm no template fanboy.

But module binaries are not entirely portable (see the provisos in the linked docs), so templates are the least worst way of doing this I've found - overjoyed if someone tells me a better way.

My crappy outdated R is another issue entirely ;-)

@christopher-hakkaart
Copy link
Member

Is everyone reasonably happy with the current PR? Are there any blockers to merging?


The generated `versions.yml` MUST have the same structure as a standard nf-core module `versions.yml`.

See the [`deseq2/differential` module](https://github.com/nf-core/modules/blob/4c2d06a5e79abf08ba7f04c58e39c7dad75f094d/modules/nf-core/deseq2/differential/templates/deseq_de.R#L509-L534) for an example using R.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is mentioned on the R_sessionInfo.log that is created in the template there.

Also an alternative suggestion for writing the R version:

paste(version[['major']],version[['minor']], sep = ".")

Think it's more easier to read then:

strsplit(version[['version.string']], ' ')[[1]][3]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is mentioned on the R_sessionInfo.log that is created in the template there.

What do you mean? I don't follow, sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second comment @Joon-Klaps please feel free to update the moduel/example :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the mentioned example lines of code there is a small section dedicated into exporting the R.sessionInfo().

sink(paste(opt\$output_prefix, "R_sessionInfo.log", sep = '.'))
print(sessionInfo())
sink()

However, nothing is mentioned of this in these guidelines. So is it necessary or not? I'm guessing so given it was included in the referenced lines but would also explicitly mention it then in the guidelines. Although not certain about the added value of the R.session() info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got you. I don't think that makes sense putting 'how' to generate the yaml file in the guideline per se - we only care that it exists.

We can point to other examples though if it helps (feel free to make an additional PR). Maybe a dedicated tutorial page how to write such a template module would make sense due to the intricacies of the variable escaping? What do you think @pinin4fjords ?

@jfy133 jfy133 merged commit f272cfe into main Dec 18, 2024
12 checks passed
@jfy133 jfy133 deleted the module_scripts branch December 18, 2024 11:38
### Script inclusion

Using module templates helps distinguish between changes made to the scientific logic within the script and those affecting the workflow-specific logic in the module. This separation improves the code's clarity and maintainability.
If a module's `script:` block contains a script rather than command invocations, regardless of the language (e.g., Bash, R, Python), and the content is more than a readable length (as a rule of thumb, approximately 20 lines), it MUST be provided through a [Nextflow module template](https://www.nextflow.io/docs/latest/module.html#module-templates).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Nextflow docs discourages the use of templates for non-bash scripts: https://nextflow.io/docs/latest/process.html#template

Template scripts are recommended only for Bash scripts. Languages that do not prefix variables with $ (e.g. Python and R) can’t be executed directly as a template script.

The best practice for using a custom script is to embed it in the process definition at first and move it to a separate file with its own command line interface once the code matures.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess the module binaries feature requires Wave when the work directory is in object storage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the requirement of wave is why we opted not to do so as then it's not portable :/ (and use the examples from Jon)

Copy link
Member Author

@pinin4fjords pinin4fjords Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsherman any chance of module binaries becoming a universal solution in future, like workflow bin? My last discussion with our lord and master suggested not, but this is is the unavoidable consequence (as far as I can tell), unless we embed massive chunks of code in the process definition, which I don't think is a better way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants