All notable changes to the pipeline-call-gSNP pipeline.
The format is based on Keep a Changelog.
This project adheres to Semantic Versioning.
- NFTest test case
- Add workflow for genotyping from GVCFs
- Standardize description
- Update GATK to 4.5.0.0
- Replace workflow diagram with PlantUML version
- Update PlantUML action to v1.0.1
- Resolve intervals before splitting to allow for index discovery
- Add workflow to generate SVG images from PlantUML
- Add workflow to build and push documentation to GitHub Pages
- Add workflow to run Nextflow configuration regression tests
- Add
CODEOWNERS
file
- Resource updater to allow update for all processes
- Local resource-related function definitions
- Validation with
PipeVal
- Custom resource allocation updates through configuration parameters
- Save tumor segmentation QC output from
CalculateContamination
- Set default compression for
GATK IndelRealignment
to 1 - Make pipeline germline variant calling only
- BAM processing steps
- Option to delete input data files for metapipeline disk usage optimization
- Standardize output file names
- Remove duplicated records based on only 11 required fields of each record
F32.config
for resource allocation
- Use external resource allocation module
- Parameterize Docker registry
- Use
ghcr.io/uclahs-cds
as default registry
- Option to emit all confident sites in GVCFs
- IndelRealignment compression parameter
- Param validation
- Parse CSV inputs using modularized
csv_parser
- Delete merged but un-deduplicated BAMs earlier for more efficient disk usage
- Bug with M64 and F2 detection
- Bug with improper output directory due to CSV parsing error
- Option for YAML input
- Record deduplication workflow
- Config for F16 node
- Reorganize repo with pipeline entrypoint at root of repo and singular directory names
- Bug with records being duplicated through the parallelized processing of BAMs (#79)
- BETA: Support for paired inputs with a single normal sample and multiple tumour samples
- Switch to SAMtools for indexing BAMs
- Use sample ID and intervals as identifiers for log output directories
- Standardize config structure
- Partially revert BQSR parallelization and group ApplyBQSR by interval
- Parallelize BaseRecalibrator per sample
- Save VQSR output for QC
- Save SNP+INDEL VQSRed VCF to output
- Parallelize BQSR
- Update .gitignore
- Update GATK to 4.2.4.1 to address Log4j vulnerabilities (https://github.com/advisories/GHSA-8489-44mv-ggj8, https://github.com/advisories/GHSA-p6xc-xr62-6r2g)
- Update Picard version to 2.26.10 to address Log4j vulnerabilities (https://github.com/advisories/GHSA-8489-44mv-ggj8)
- Enable threading for MergeSamFiles
- Parallelize reheadering and indexing processes
- Update reheadering to use -c option
- Modularize workflows for different modes (single vs. paired, WGS vs targeted)
- Update GATK to 4.2.4.0 to address Log4j critical vulnerability (https://github.com/advisories/GHSA-jfh8-c2jp-5v3q)
- Update Picard to 2.26.8 to address Log4j critical vulnerability (https://github.com/advisories/GHSA-jfh8-c2jp-5v3q)
- Parallelize IR and BQSR in WXS/WES mode
- Fix targeted, single sample mode bugs
- Update call-gSNP to DSL2
- Add GPL2 license
- Parallelize MergeVcfs
- Parallelize MergeSamFiles
- Standardize output and log directories
- Add process to remove intermediate files when save_intermediate_files is disabled
- Parallelize GetPileupSummaries, CalculateContamination, and DepthOfCoverage processes
- Split HaplotypeCaller process into process for VCF and GVCF modes
- Parallelize GVCF HC process
- Extract genome intervals from reference dictionary
- Adjust static resource allocation to be more efficient
- Auto-detect reference fasta dictionary
- Rename ".bai" output files to ".bam.bai"
- Auto-detect when in targeted mode and when in WGS mode