Skip to content

Release 2.0.0

Compare
Choose a tag to compare
@nh13 nh13 released this 04 Apr 22:09
· 73 commits to main since this release

Overview

This is the second major release of fgbio. A lot has changed in this release, including a significant number of backward incompatible changes to tools.

A major theme of this release is performance of the UMI-related tools. The consensus callers now have options to parallelize using --threads options as well as some internal optimizations. Sorting of data has been eliminated in many places (more on this below). And a new tool (ZipperBams) has been added as a much lighter weight and therefore faster alternative to picard MergeBamAlignment.

A best practices document has been drafted to show the recommended way to go from FASTQ files through to sorted and filtered consensus BAMs.

Major Changes

  • Major performance improvements in CallMolecularConsensusReads and CallDuplexConsensusReads by i) adding an optimized path for creating a "consensus" from a single read and ii) enabling efficient parallelization in #776 and #790
  • New tool ZipperBams, which is a replacement for picard's MergeBamAlignment by @tfenne in #778. ZipperBams handles any query-grouped BAM files and does not require sorting of the input or output.
  • Make GroupReadsByUmi more permissive in the alignments it accepts by @tfenne in #768. Starting with this release GroupReadsByUmi will accept inter-chromosomal read-pairs by default, the --min-map-q parameter has had its default changed from 30 to 1, and read-pairs with one mapped and one unmapped reads are also accepted.
  • GroupReadsByUmi can be run with no internal sorting if the input is already in TemplateCoordinate order by @nh13 in #794. This can be achieved using either fgbio SortBam or a template-coordinate sort in a forthcoming release of samtools.
  • New tool CallOverlappingConsensusBases to consensus call overlapping bases in paired end reads. Adds direct support in the consensus calling tools (CallMolecularConsensusReads and CallDuplexConsensusReads) too. By @nh13 #805

Backward Incompatibilities

  • Change default sort orders of consensus callers by @nh13 in #781. Now, by default, consensus callers will emit reads in the same order they are read in and perform no sorting. Sorting of the output is available, but is opt-in.
  • Specify an output sort order in FilterConsensusReads by @nh13 in #782. Previously FilterConsensusReads would always sort its output into coordinate order. The new behaviour is to emit reads in the same order as the input, with sorting being opt-in via the --sort-order option.
  • Require template sort orders in ClipBam and FilterConsensusReads by @nh13 in #807. Previously ClipBam and FilterConsensusReads would sort their input if it was neither queryname sorted nor query-grouped. This behaviour was surprising to many users and led to extended runtimes. The tools now require the input BAM be either queryname-sorted of query-grouped and will fail fast if they are not. Output sorting is still available, but the default is to emit reads in the same order as the input.
  • Both ClipBam and FilterConsensusReads require the reference to be full loaded into memory, versus previously iterating contig-by-contig by @nh13 in #807. This is required as both tools modify the bases and alignment and so need to update the NM/UQ/MD SAM tags (e.g. NM/UQ/MD). ClipBam also needs to update mate information (SAM flag) depending on if reads are fully clipped. Therefore the JVM heap size may need to be increased to fit the full reference in memory (e.g. -Xmx8g for a human genome).

Minor Changes

  • Add a tool to copy the UMI from the read name by @nh13 in #775
  • Add the --annotate-all option to AssignPrimers by @nh13 in #669
  • Added ability for FastqToBam to also extract UMIs from read names. by @tfenne in #800
  • Bugfix for "ConsensusCallingIterator could fail when no consensus reads are called" by @tfenne in #780
  • Change default validation stringency to SILENT and make common option… by @tfenne in #793
  • Do not return zero-length alignments by @nh13 in #552
  • More ergonomic methods for converting between HTSJDK and fgbio SequenceDictionary objects by @tfenne in #767
  • Reduce memory usage by GroupReadsByUmi in a corner case by @tfenne in #774
  • Support for clipping reads that extend past their mate by @nh13 in #761
  • Updates version of snappy to support Apple Silicon by @tfenne in #772
  • Fixes a bug where VcfWriter was not writing VCF index files by @clintval #816
  • Improved documentation of LogProbability methods by @wmchad #817
  • Make SamWriter stop checking sort order when emitting pre-sorted records by @tfenne #820

Full Changelog: 1.5.1...2.0.0