This repository maintains the source code of the ICGC ARGO DNA Seq Processing Pipeline. The pipeline is written in Nextflow workflow language using DSLv2, with modules imported from other ICGC ARGO GitHub repositories. Specifically, here are repositories maintaining various of tools/modules:
- https://github.com/icgc-argo-workflows/dna-seq-processing-tools
- https://github.com/icgc-argo-workflows/data-processing-utility-tools
- https://github.com/icgc-argo-workflows/nextflow-dna-seq-processing-tools
- https://github.com/icgc-argo-workflows/data-qc-tools-and-wfs
Each Nextflow module (including associated container image which is registered in Quay.io) is strictly version controlled and released independently. To ensure reproducibility the pipeline declares explicitly which specific version of a module is to be imported.
- download input sequencing metadata/data from
SONG/SCORE - preprocess input sequencing reads (in
FASTQorBAM) into lane level (aka read group level)BAM - collect
CollectQualityYieldMetricsusingPicardtool for read group - perform
BWA-MEMalignment againstGRCh38reference genome in parallel for each laneBAM - merge and markduplicate aligned lane
BAM, produce coordinate-sortedCRAM/CRAIandduplicates_metrics - collect alignment QC metrics using
samtools statsfor aligned seq - collect
CollectOxoGMetricsusingGATKfor aligned seq and calculateOxoQscore - generate
SONGmetadata for aligned seq and upload them toSONG/SCORE - generate
SONGmetadata for all collectedqc_metricsand upload them toSONG/SCORE
To run the pipeline, please follow instruction here to install Nextflow (version 20.01.0 or higher) first.
Run 1.9.1 version of the pipeline:
nextflow run icgc-argo-workflows/dna-seq-processing-wfs -r 1.9.1 -params-file <your_params_file.json>
You may need to run nextflow pull icgc-argo-workflows/dna-seq-processing-wfs if the version 1.9.1 is new since last time the pipeline was run.
Please note that SONG/SCORE services need to be available and you have appropriate API token.
Automated Travis CI testing has been set up. However, tests relying on SONG/SCORE will be skipped when CI is triggered on a Travis server where SONG/SCORE services are not available. When running tests locally (where SONG/SCORE services may be available) please use the following commands under the root directory of this Git repository:
# perform all tests when SONG/SCORE is available
export api_token=<your_api_token>
pytest -v
# or perform tests that do not need SONG/SCORE
TRAVIS=true pytest -v