- Nextflow Modules
A set of Nextflow modules commonly used across pipelines.
Module for deleting intermediate files from disk as they're no longer needed by downstream processes. Symbolic links are followed to the actual file and both are deleted.
Tools used: GNU rm and readlink.
Inputs:
file_to_remove: path to file to be deletedready_for_deletion_signal: val to indicate that file is no longer needed by any processes
Parameters:
output_dir: directory for storing outputslog_output_dir: directory for storing log filessave_intermediate_files: boolean indicating whether this process should run (disable when intermediate files need to be kept)docker_image: docker image within which process will run. The default is:ghcr.io/uclahs-cds/pipeval:3.0.0process_label: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files
- Add this repository as a submodule in the pipeline of interest
- Include the
remove_intermediate_filesprocess from the modulemain.nfwith a relative path - Use the
addParamsdirective when importing to specify any params - Call the process with the inputs where needed
Module for extracting the genome intervals from a reference genome dictionary.
Tools used: GNU grep, cut, and sed.
Inputs:
- reference_dict: path to reference genome dictionary
Parameters:
output_dir: directory for storing outputslog_output_dir: directory for storing log filessave_intermediate_files: boolean indicating whether the extracted intervals should be copied to the output directorydocker_image: docker image within which process will run. The default is:ghcr.io/uclahs-cds/pipeval:3.0.0process_label: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files
- Add this repository as a submodule in the pipeline of interest
- Include the
extract_GenomeIntervalsprocess from the modulemain.nfwith a relative path - Use the
addParamsdirective when importing to specify any params - Call the process with the inputs where needed
Module containing function to take components of a filename and combine them in a standardized format, returned as a string.
Tools used: Groovy functions
Inputs:
main_tool: string containing name and version of main tool used for generating filedataset_id: string identifying dataset the file belongs tosample_id: string identifying the same contained in the fileadditional_args: Map containing additional optional arguments. Available args:additional_tools: list of strings identifying any additional tools to include in filenameadditional_information: string containing any additional information to be included at the end of the filename
Additional functions:
sanitize_string- Pass input string to sanitize, keeping only alphanumeric,-,/, and.characters and replacing_with-- Inputs:
raw: string to sanitize
- Inputs:
Outputs:
- String representing the standardized filename
- Add this repository as a submodule in the pipeline of interest
- Include the
generate_standard_filenameand any additional necessary functions from the modulemain.nfwith a relative path in any Nextflow file requiring use of the function - Call the functions as needed with the approriate inputs and use returned value to set file names
Module for validating files and directories using PipeVal. There are two nearly-identical methods in this module: run_validate_PipeVal and run_validate_PipeVal_with_metadata.
Tools used: PipeVal.
Inputs:
file_to_validate: path for file or directory to validate
Inputs:
run_validate_PipeVal:file_to_validate: path for file to generate a checksum
run_validate_PipeVal_with_metadataInputs:- A tuple of:
file_to_validate: path for file to generate a checksummetadata: arbitraryvalpassed through to the output
- A tuple of:
Parameters:
log_output_dir: directory for storing log filesdocker_image_version: PipeVal docker image version within which process will run. The default is:4.0.0-rc.2process_label: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config filesmain_process: Set output directory to the specified main process instead ofPipeVal-4.0.0-rc.2
Outputs:
validation_result: path of file with validation output textvalidated_file:file_to_validateor tuple of (file_to_validate,metadata)
- Add this repository as a submodule in the pipeline of interest
- Include the
run_validate_PipeValorrun_validate_PipeVal_with_metadataprocess from the modulemain.nfwith a relative path - Use the
addParamsdirective when importing to specify any params - Call the process with the inputs where needed
- Aggregate and save the output validation files as needed
Module for generating checksums for files using PipeVal
Tools used: PipeVal.
Inputs:
input_file: path for file to generate a checksum
Parameters:
output_dir: directory for storing checksumslog_output_dir: directory for storing log filesdocker_image_version: PipeVal docker image version within which process will run. The default is:4.0.0-rc.2process_label: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config filesmain_process: Set output directory to the specified main process instead ofPipeVal-4.0.0-rc.2checksum_alg: Type of checksum to generate. Choices:sha512(default),md5
- Add this repository as a submodule in the pipeline of interest
- Include the
generate_checksum_PipeValprocess from the modulemain.nfwith a relative path - Use the
addParamsdirective when importing to specify any params - Call the process with the inputs where needed
Module for compressing and indexing VCF/GFF files, the input should be compressed or uncompressed *.vcf or *.gff files.
Tools used: tabix, bgzip.
Inputs:
- id: string identifying the
idof the indexed VCF. For more than one VCFs, theidshould be unique for each sample. - file_to_index: path for VCF file to compress and index.
Parameters:
- output_dir: directory to store compressed VCF and index files.
- log_output_dir: directory to store log files.
- docker_image: SAMtools docker image version within which process will run. The default is:
1.15.1 - process_label: assign Nextflow process label to process to control resource allocation. For specific CPU and memory allocation, include static allocations in node-specific config files
- is_output_file: determine the output of this process should be saved to
outputorintermediatefolder. Forintermediateprocess, usingaddParamsto specifyis_output_file: false. The default istrue. - save_intermediate_files: whether the index files should be saved to the intermediate output directory.
- unzip_and_rezip: whether compressed files should be uncompressed and re-compressed using
bgzip. The default isfalse.
- Add this repository as a submodule in the pipeline of interest.
- Include the
compress_index_VCFworkflow from the modulemain.nfwith a relative path. - Use the
addParamsdirective when importing to specify any params. - Call the process with the input channel, a tuple with
idandfile_path.
Module returns the expected path to the index file for a given input file. NOTE! This does not check for the existence of the index file.
Inputs:
- input_file: currently supports BAM or VCF
Output:
- The input file path with the expected index extension appended: currently
.baifor BAM files and.tbifor VCF files
- Add this repository as a submodule in the pipeline of interest.
- Include the
indexFilefunction from the modulemain.nfwith a relative path. - Call the function as needed with the approriate input and use returned value as index file name
Module for indexing SAM/BAM/CRAM alignment files.
Tools used: samtools.
Inputs:
sample: sample identifier to organize process log files.alignment_file:.bam,.cram, or.samfile to be indexed.
input channel structure: tuple[sample, alignment_file]
Parameters:
output_dir: directory to store created index file.log_output_dir: directory to store log files.main_process: sets output directory to the specified main process.docker_image_version: SAMtools docker image tag.docker_image: docker image.
- Add this repository as a submodule in the pipeline of interest.
- Include the
run_index_SAMtoolsworkflow from the modulemain.nfwith a relative path. - Use the
addParamsdirective when importing to specify any params. - Call the process with the input channel, a tuple with
sampleandalignment_file.
Module for converting a BCF to a VCF file.
Tools used: bcftools.
Inputs:
sample: sample identifier to organize process log files.bcf_file:.bcffile to be converted.bcf_index:.bcf.csior.bcf.tbiindex for the.bcffile.
input channel structure: tuple[sample, bcf_file, bcf_index]
Parameters:
output_dir: directory to store converted VCF file.log_output_dir: directory to store log files.main_process: sets output directory to the specified main process.docker_image_version: BCFtools docker image tag.docker_image: docker image.
- Add this repository as a submodule in the pipeline of interest.
- Include the
convert_BCF2VCF_BCFtoolsworkflow from the modulemain.nfwith a relative path. - Use the
addParamsdirective when importing to specify any params. - Call the process with the input channel, a tuple with
sample,bcf_fileandbcf_index.
Author: Yash Patel ([email protected])
pipeline-Nextflow-module is licensed under the GNU General Public License version 2. See the file LICENSE for the terms of the GNU GPL license.
pipeline-Nextflow-module comprises a set of commonly used Nextflow modules.
Copyright (C) 2021 University of California Los Angeles ("Boutros Lab") All rights reserved.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.