ferlab/postprocessing: Contributing Guidelines

Hi there! Many thanks for taking an interest in improving ferlab/postprocessing.

If you haven't already, we recommend creating a GitHub issue to describe your task. Please use the pre-filled template to save time.

Please do your best to follow the guidelines defined here. If you are unsure about the current standards or need support, feel free to ask for help in the #bioinfo Slack channel.

We also hold a few notion pages as documentation:

Nf-core guidelines
Help with samplesheet and schemas
Notes for module/subworkflow building

Contribution workflow

We follows the guidelines outlined by Ferlab for our git flow, which are detailed in the Developer Handbook.

Please ensure that you adhere to the conventions for branch names and commit messages.

If applicable, use nf-core schema build to add new parameters to the pipeline JSON schema. This requires nf-core tools version 1.10 or higher.

Tests

When you create a pull request with changes, GitHub Actions will run automatic tests.

A pull-request should only be merged when all these tests are passing.

There are 3 types of tests run, that are described below.

Lint tests

The lint test will run the nf-core linter, i.e. the following command and check for errors/warnings. nf-core lint

It is currently deactivated, but we highly recommend to run it locally. Ensure that no lint test fails and that no additional warnings appear compared to the main branch.

At Ferlab, we don't enforce all linting rules. If a test should be ignored, it should be added to .nf-core.yml.

Pipeline tests

This test runs the pipeline with a minimal set of test data. It only checks that the pipeline can run successfully. Since our test setup is not fully ready yet, it runs in stub mode at the moment.

These tests are run both with the latest available version of Nextflow and also the minimum required version that is stated in the pipeline code.

You are encouraged to test in non-stub mode locally and in integration environments. Reach out to the Ferlab bioinformatics team if you need help with this. You can find example commands in the test.config configuration file.

nf-test tests

This test runs unit tests with nf-test. For now, for performance reasons, it only runs tests applicable to the submitted changes and tests tagged with the keyword "local".

The tests are only run with the Nextflow version expected in production, also for performance reasons.

Pipeline contribution conventions

To make the ferlab/postprocessing code and processing logic more understandable for new contributors and to ensure quality, we try to follow nf-core standards as much as possible.

They are described below. Try to follow them as much as possible. If you are unsure, feel free to reach out to the bioinformatics team.

Adding a new step

If you wish to contribute a new step, please use the following coding standards:

Define the corresponding input channel into your new process from the expected previous process channel
Write the process block (see below).
Define the output channel if needed (see below).
Add any new parameters to nextflow.config with a default (see below).
Add any new parameters to nextflow_schema.json with help text (via the nf-core schema build tool).
Add sanity checks and validation for all relevant parameters.
Perform local tests to validate that the new code works as expected.
If applicable, add a new test command in .github/workflow/ci.yml.
If applicable, write nf-test unit tests
Add a description of the output files if relevant to docs/output.md.

Default values

Parameters should be initialised / defined with default values in nextflow.config under the params scope.

Once there, use nf-core schema build to add to nextflow_schema.json.

Default processes resource requirements

Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in conf/base.config. These should generally be specified generic with withLabel: selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the nf-core pipeline template, which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.

The process resources can be passed on to the tool dynamically within the process with the ${task.cpus} and ${task.memory} variables in the script: block.

Naming schemes

Please use the following naming schemes, to make it easy to understand what is going where.

initial process channel: ch_output_from_<process>
intermediate and terminal channels: ch_<previousprocess>_for_<nextprocess>

Nextflow version bumping

If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: nf-core bump-version --nextflow . [min-nf-version]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!