Skip to content

[2.2.0] - Ulm - 2020-10-21

Compare
Choose a tag to compare
@jfy133 jfy133 released this 21 Oct 08:14
· 956 commits to master since this release
b1ae5ad

Added

  • Major Automated cloud tests with large-scale data on AWS
  • Major Re-wrote input logic to accept a TSV 'map' file in addition to direct paths to FASTQ files
  • Major Added JSON Schema, enabling web GUI for configuration of pipeline available here
  • Major Lane and library merging implemented
    • When using TSV input, one library with the multiple lanes will be merged together, before mapping
      • Strip FASTQ will also produce a lane merged 'raw' but 'stripped' FASTQ file
    • When using TSV input, one sample with multiple (same treatment) libraries will be merged together
    • Important: direct FASTQ paths will not have this functionality. TSV is required.
  • #40 - Added the pileupCaller genotyper from sequenceTools
  • Added validation check and clearer error message when --fasta_index is provided and filepath does not end in .fai.
  • Improved error messages
  • Added ability for automated emails using mailutils to also send MultiQC reports
  • General documentation additions, cleaning, and updated figures with CC-BY license
  • Added large 'full size' dataset test-profiles for ancient fish and human contexts human
  • #257 - Added the bowtie2 aligner as option for mapping, following Poullet and Orlando 2020 doi: 10.3389/fevo.2020.00105
  • #451 - Adds ANGSD genotype likelihood calculations as an alternative to typical 'genotypers'
  • #566 - Add tutorials on how to set up nf-core/eager for different contexts
  • Nuclear contamination results are now shown in the MultiQC report
  • Tutorial on how to use profiles for reproducible science (i.e. parameter sharing between different groups)
  • #522 - Added post-mapping length filter to assist in more realistic endogenous DNA calculations
  • #512 - Added flexible trimming of BAMs by library type. 'half' and 'none' UDG libraries can now be trimmed differentially within a single eager run.
  • Added a .dockstore.yml config file for automatic workflow registration with dockstore.org
  • Updated template to nf-core/tools 1.10.2
  • #544 - Add script to perform bam filtering on fragment length
  • #456 - Bumps the base (default) runtime of all processes to 4 hours, and set shorter time limits for test profiles (1 hour)
  • #552 - Adds optional creation of MALT SAM files alongside RMA6 files
  • Added eigenstrat snp coverage statistics to MultiQC report. Process results are published in genotyping/*_eigenstrat_coverage.txt.

Fixed

  • #368 - Fixed the profile test to contain a parameter for --paired_end
  • Mini bugfix for typo in line 1260+1261
  • #374 - Fixed output documentation rendering not containing images
  • #379 - Fixed insufficient memory requirements for FASTQC edge case
  • #390 - Renamed clipped/merged output directory to be more descriptive
  • #398 - Stopped incompatible FASTA indexes being accepted
  • #400 - Set correct recommended bwa mapping parameters from Schubert et al. 2012
  • #410 - Fixed nf-core/configs not being loaded properly
  • #473 - Fixed bug in sexdet_process on AWS
  • #444 - Provide option for preserving realigned bam + index
  • Fixed deduplication output logic. Will now pass along only the post-rmdup bams if duplicate removal is not skipped, instead of both the post-rmdup and pre-rmdup bams
  • #497 - Simplifies number of parameters required to run bam filtering
  • #501 - Adds additional validation checks for MALT/MaltExtract database input files
  • #508 - Made Markduplicates default dedupper due to narrower context specificity of dedup
  • #516 - Made bedtools not report out of memory exit code when warning of inconsistent FASTA/Bed entry names
  • #504 - Removed uninformative sexdeterrmine-snps plot from MultiQC report.
  • Nuclear contamination is now reported with the correct library names.
  • #531 - Renamed 'FASTQ stripping' to 'host removal'
  • Merged all tutorials and FAQs into usage.md for display on nf-co.re
  • Corrected header of nuclear contamination table (nuclear_contamination.txt).
  • Fixed a bug with nSNPs definition in print_x_contamination.py. Number of SNPs now correctly reported
  • print_x_contamination.py now correctly converts all NA values to "N/A"
  • Increased amount of memory MultiQC by default uses, to account for very large nf-core/eager runs (e.g. >1000 samples)

Dependencies

  • Added sequenceTools (1.4.0.6) that adds the ability to do genotyping with the 'pileupCaller'
  • Latest version of DeDup (0.12.6) which now reports mapped reads after deduplication
  • #560 Latest version of Dedup (0.12.7), which now correctly reports deduplication statistics based on calculations of mapped reads only (prior denominator was total reads of BAM file)
  • Latest version of ANGSD (0.933) which doesn't seg fault when running contamination on BAMs with insufficient reads
  • Latest version of MultiQC (1.9) with support for lots of extra tools in the pipeline (MALT, SexDetERRmine, DamageProfiler, MultiVCFAnalyzer)
  • Latest versions of Pygments (7.1), Pymdown-Extensions (2.6.1) and Markdown (3.2.2) for documentation output
  • Latest version of Picard (2.22.9)
  • Latest version of GATK4 (4.1.7.0)
  • Latest version of sequenceTools (1.4.0.6)
  • Latest version of fastP (0.20.1)
  • Latest version of Kraken2 (2.0.9beta)
  • Latest version of FreeBayes (1.3.2)
  • Latest version of xopen (0.9.0)
  • Added Bowtie 2 (2.4.1)
  • Latest version of Sex.DetERRmine (1.1.2)
  • Latest version of endorS.py (0.4)