Nextflow Modules

Nextflow Modules

Overview

A set of Nextflow modules commonly used across pipelines.

Available Modules

Intermediate file removal

Description

Module for deleting intermediate files from disk as they're no longer needed by downstream processes. Symbolic links are followed to the actual file and both are deleted.

Tools used: GNU rm and readlink.

Inputs:

file_to_remove: path to file to be deleted
ready_for_deletion_signal: val to indicate that file is no longer needed by any processes

Parameters:

output_dir: directory for storing outputs
log_output_dir: directory for storing log files
save_intermediate_files: boolean indicating whether this process should run (disable when intermediate files need to be kept)
docker_image: docker image within which process will run. The default is: ghcr.io/uclahs-cds/pipeval:3.0.0
process_label: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files

How to use

Add this repository as a submodule in the pipeline of interest
Include the remove_intermediate_files process from the module main.nf with a relative path
Use the addParams directive when importing to specify any params
Call the process with the inputs where needed

Genomic interval extraction

Description

Module for extracting the genome intervals from a reference genome dictionary.

Tools used: GNU grep, cut, and sed.

Inputs:

reference_dict: path to reference genome dictionary

Parameters:

output_dir: directory for storing outputs
log_output_dir: directory for storing log files
save_intermediate_files: boolean indicating whether the extracted intervals should be copied to the output directory
docker_image: docker image within which process will run. The default is: ghcr.io/uclahs-cds/pipeval:3.0.0
process_label: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files

How to use

Add this repository as a submodule in the pipeline of interest
Include the extract_GenomeIntervals process from the module main.nf with a relative path
Use the addParams directive when importing to specify any params
Call the process with the inputs where needed

Standardized Filename Generator

Description

Module containing function to take components of a filename and combine them in a standardized format, returned as a string.

Tools used: Groovy functions

Inputs:

main_tool: string containing name and version of main tool used for generating file
dataset_id: string identifying dataset the file belongs to
sample_id: string identifying the same contained in the file
additional_args: Map containing additional optional arguments. Available args:
- additional_tools: list of strings identifying any additional tools to include in filename
- additional_information: string containing any additional information to be included at the end of the filename

Additional functions:

sanitize_string - Pass input string to sanitize, keeping only alphanumeric, -, /, and . characters and replacing _ with -
- Inputs:
  - raw: string to sanitize

Outputs:

String representing the standardized filename

How to use

Add this repository as a submodule in the pipeline of interest
Include the generate_standard_filename and any additional necessary functions from the module main.nf with a relative path in any Nextflow file requiring use of the function
Call the functions as needed with the approriate inputs and use returned value to set file names

PipeVal

Validate

Description

Module for validating files and directories using PipeVal. There are two nearly-identical methods in this module: run_validate_PipeVal and run_validate_PipeVal_with_metadata.

Tools used: PipeVal.

Inputs:

file_to_validate: path for file or directory to validate

Inputs:

run_validate_PipeVal:
- file_to_validate: path for file to generate a checksum
run_validate_PipeVal_with_metadata Inputs:
- A tuple of:
  - file_to_validate: path for file to generate a checksum
  - metadata: arbitrary val passed through to the output

Parameters:

log_output_dir: directory for storing log files
docker_image_version: PipeVal docker image version within which process will run. The default is: 4.0.0-rc.2
process_label: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files
main_process: Set output directory to the specified main process instead of PipeVal-4.0.0-rc.2

Outputs:

validation_result: path of file with validation output text
validated_file: file_to_validate or tuple of (file_to_validate, metadata)

How to use

Add this repository as a submodule in the pipeline of interest
Include the run_validate_PipeVal or run_validate_PipeVal_with_metadata process from the module main.nf with a relative path
Use the addParams directive when importing to specify any params
Call the process with the inputs where needed
Aggregate and save the output validation files as needed

generate-checksum

Description

Module for generating checksums for files using PipeVal

Tools used: PipeVal.

Inputs:

input_file: path for file to generate a checksum

Parameters:

output_dir: directory for storing checksums
log_output_dir: directory for storing log files
docker_image_version: PipeVal docker image version within which process will run. The default is: 4.0.0-rc.2
process_label: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files
main_process: Set output directory to the specified main process instead of PipeVal-4.0.0-rc.2
checksum_alg: Type of checksum to generate. Choices: sha512(default), md5

How to use

Add this repository as a submodule in the pipeline of interest
Include the generate_checksum_PipeVal process from the module main.nf with a relative path
Use the addParams directive when importing to specify any params
Call the process with the inputs where needed

Compress and Index VCF File

Description

Module for compressing and indexing VCF/GFF files, the input should be compressed or uncompressed *.vcf or *.gff files.

Tools used: tabix, bgzip.

Inputs:

id: string identifying the id of the indexed VCF. For more than one VCFs, the id should be unique for each sample.
file_to_index: path for VCF file to compress and index.

Parameters:

output_dir: directory to store compressed VCF and index files.
log_output_dir: directory to store log files.
docker_image: SAMtools docker image version within which process will run. The default is: 1.15.1
process_label: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files
is_output_file: determine the output of this process should be saved to output or intermediate folder. For intermediate process, using addParams to specify is_output_file: false. The default is true.
save_intermediate_files: whether the index files should be saved to the intermediate output directory.
unzip_and_rezip: whether compressed files should be uncompressed and re-compressed using bgzip. The default is false.

How to use

Add this repository as a submodule in the pipeline of interest.
Include the compress_index_VCF workflow from the module main.nf with a relative path.
Use the addParams directive when importing to specify any params.
Call the process with the input channel, a tuple with id and file_path.

Return Expected Index File

Description

Module returns the expected path to the index file for a given input file. NOTE! This does not check for the existence of the index file.

Inputs:

input_file: currently supports BAM or VCF

Output:

The input file path with the expected index extension appended: currently .bai for BAM files and .tbi for VCF files

How to use

Add this repository as a submodule in the pipeline of interest.
Include the indexFile function from the module main.nf with a relative path.
Call the function as needed with the approriate input and use returned value as index file name

SAMtools Index

Module for indexing SAM/BAM/CRAM alignment files.

Tools used: samtools.

Description

Inputs:

sample: sample identifier to organize process log files.
alignment_file: .bam, .cram, or .sam file to be indexed.

input channel structure: tuple[sample, alignment_file]

Parameters:

output_dir: directory to store created index file.
log_output_dir: directory to store log files.
main_process: sets output directory to the specified main process.
docker_image_version: SAMtools docker image tag.
docker_image: docker image.

How to use

Add this repository as a submodule in the pipeline of interest.
Include the run_index_SAMtools workflow from the module main.nf with a relative path.
Use the addParams directive when importing to specify any params.
Call the process with the input channel, a tuple with sample and alignment_file.

BCFtools BCF2VCF

Module for converting a BCF to a VCF file.

Tools used: bcftools.

Description

Inputs:

sample: sample identifier to organize process log files.
bcf_file: .bcf file to be converted.
bcf_index: .bcf.csi or .bcf.tbi index for the .bcf file.

input channel structure: tuple[sample, bcf_file, bcf_index]

Parameters:

output_dir: directory to store converted VCF file.
log_output_dir: directory to store log files.
main_process: sets output directory to the specified main process.
docker_image_version: BCFtools docker image tag.
docker_image: docker image.

How to use

Add this repository as a submodule in the pipeline of interest.
Include the convert_BCF2VCF_BCFtools workflow from the module main.nf with a relative path.
Use the addParams directive when importing to specify any params.
Call the process with the input channel, a tuple with sample, bcf_file and bcf_index.

License

Author: Yash Patel ([email protected])

pipeline-Nextflow-module is licensed under the GNU General Public License version 2. See the file LICENSE for the terms of the GNU GPL license.

pipeline-Nextflow-module comprises a set of commonly used Nextflow modules.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.github		.github
modules		modules
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
README.md		README.md
metadata.yaml		metadata.yaml

License

uclahs-cds/pipeline-Nextflow-module

Folders and files

Latest commit

History

Repository files navigation

Nextflow Modules

Overview

Available Modules

Intermediate file removal

Description

How to use

Genomic interval extraction

Description

How to use

Standardized Filename Generator

Description

How to use

PipeVal

Validate

Description

How to use

generate-checksum

Description

How to use

Compress and Index VCF File

Description

How to use

Return Expected Index File

Description

How to use

SAMtools Index

Description

How to use

BCFtools BCF2VCF

Description

How to use

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 11

Uh oh!

Languages

Packages