Releases: fulcrumgenomics/fgbio
2.4.0
What's Changed
- feat: SequenceMetadata can have name and length looked up by key by @nh13 in #1002
- Fix snapshot releases by @nh13 in #1004
- feat: raise exception if CollectDuplexSeqMetrics run on consensus BAM by @znorgaard in #1003
- Ensure
--min-reads
is a hard filter in CallDuplexConsensusReads by @clintval in #1010
New Contributors
- @znorgaard made their first contribution in #1003
Full Changelog: 2.3.0...2.4.0
Release 2.3.0
What's Changed
- Add a Zenodo DOI to the README by @nh13 in #955
- PileupBuilder should not report insertions when checking the final mapped base before soft-clipping by @jrm5100 in #956
- fix: ensure that the GroupReadsByUmiTests for marking duplicates are by @nh13 in #962
- fix: mapped header records should overwrite unmapped in ZipperBams by @nh13 in #963
- Fix typo in ExtractUmisFromBam.scala by @nh13 in #966
- ZipperBams to produce mate score ("ms") for samtools markdup by @nh13 in #952
- Make
PileupBuilder.includeMapPositionsOutsideFrInsert
intuitively correct by @clintval in #981 - doc: update description of consenus tags for duplex by @nh13 in #983
- Update UpdateGffContigNames.scala (typo in docs) by @yfarjoun in #985
- Add conda install instructions to README by @clintval in #988
- Add TemplateCoordinate sort order to the usage of SortBam by @nh13 in #993
- Create CODEOWNERS by @nh13 in #999
- feat: add --umi-prefix to CopyUmiFromReadName by @msto in #958
- Validate IO in SortBam to provide nicer exceptions by @nh13 in #994
- Improve the list of tools in the README.md by @nh13 in #991
- doc: fix duplicate "the" in sequence dictionary docstrings by @nh13 in #1000
New Contributors
Full Changelog: 2.2.1...2.3.0
Release 2.2.1
What's Changed
Full Changelog: 2.2.0...2.2.1
Release 2.2.0
What's Changed
New Features
- Duplicate marking in GroupReadsByUmi by @tfenne in #940 -
GroupReadsByUmi
can now optionally also set thepcr_duplicate
flag field on all reads while duplicate marking. If duplicate marking mode is engaged then by default secondary, supplementary and mapq=0 reads are passed through to the output BAM - Addition of threading in GroupReadsByUmi and some other performance optimizations by @tfenne in #950 - threading is designed to help in the specific case where there are very large numbers of UMIs present on reads with the same coordinates (e.g. multiplex PCR with UMIs)
- Add optional validation of kept read ratio to CorrectUmis by @mjhipp in #917
- feat: allow TrimFastq to specify a length per input FASTQ by @nh13 in #928
- feat: add an option to store sample base qualities in the QT for FastqToBam by @nh13 in #933
- adds a barcode option to FastqToBam by @bwlang in #936
Bug Fixes
- Ensure FilterSomaticVcf handles PASS variants correctly by @clintval in #909
- Fix complement of W and S iupac codes. by @tfenne in #912
- Fix pass-QC in output FASTQ read names by @nh13 in #923
- bugfix: ZipperBams should consume any remaining mapped reads by @nh13 in #929
Other Changes
- Update ZipperBams to state the sort is checked in the SAM header by @nh13 in #894
- Improve docs for consensus reads being unaligned by @nh13 in #897
- Fix typos in alignment by @nh13 in #914
- Fix Alignment test by @jacarey in #913
- Update intel gkl to 0.8.10 by @nh13 in #918
- Update broad snapshot artifactory url in build.sbt by @mjhipp in #925
- Fix link in DemuxFastqs.scala by @PeteHaitch in #938
- Suggest fqtk in DemuxFastqs by @nh13 in #939
- Fix reference to transient MI tag in DuplexConsensusCaller by @clintval in #946
New Contributors
- @bwlang made their first contribution in #936
- @PeteHaitch made their first contribution in #938
Full Changelog: 2.1.0...2.2.0
Release 2.1.0
Minor release with mostly bug-fixes and one new tool.
New Tools
DownsampleAndNormalizeBam
- performs semi-random downsampling to reduce coverage towards a specified target coverage while retaining reads in low-coverage areas. See #893
Bug Fixes
- Fix overly aggressive overlap-clipping regressions in ClipBam by @jrm5100 in #850
- ReviewConsensusVariants should not require grouped raw reads to by @nh13 in #860
- Fix a bug where consensus reads are produced with zero depth by @nh13 in #859
- Fix for nasty corner case described in issue #858 in CallDuplexConsensusReads by @nh13 in #864
- Tweak the size of caches for parallel consensus calling down to reduce memory usage by @tfenne in #881
Minor Changes
Other
Full Changelog: 2.0.2...2.1.0
Release 2.0.2
Minor release with bug fixes and minor changes. If you use the 2.x version ClipBam
, CallMolecularConsensusReads
or CallDuplexConsensusReads
, please upgrade to this version.
Bug fixes
SamRecordClipper.clipOverlappingReads
now accounts for soft-clipped bases starting before the ends (#842) by @jrm5100. This affectsClipBam
and consensus calling tools (CallMolecularConsensusReads
andCallDuplexConsensusReads
). This bug was introduced in #761 and in 2.0 release.
Minor Changes
- Add a missing param to constructor of
StreamingPileupBuilder
viaapply()
(#845) by @clintval . - Update scala-xml to a much more recent version and drop the collections-compat requirement we no longer need (#838) by @tfenne.
- Ensure
SamWriter
always logs how many it wrote beforeclose()
(#829) by @clintval .
Release 2.0.1
Minor release with bug fixes.
Please upgrade in particular if you use either CallMolecularConsensusReads
or CallDuplexConsensusReads
.
Minor Changes
Bug fixes in the OverlappingBasesConsensusCaller
(introduced in 2.0.0), which apply to the tools CallMolecularConsensusReads
, CallDuplexConsensusReads
, and CallOverlappingConsensusBases
. Fixes:
- A case when the alignments for a read and its mate overlap but share no _mapped_bases by @nh13 in #824.
- Logging the number of bases examined and corrected for overlapping bases in the overlapping consensus caller by @nh13 in #825.
See issue #821 for more discussion on the above.
Thank-you to @blackbeerd for providing the initial report and test cases to debug!
Full Changelog: 2.0.0...2.0.1
Release 2.0.0
Overview
This is the second major release of fgbio. A lot has changed in this release, including a significant number of backward incompatible changes to tools.
A major theme of this release is performance of the UMI-related tools. The consensus callers now have options to parallelize using --threads
options as well as some internal optimizations. Sorting of data has been eliminated in many places (more on this below). And a new tool (ZipperBams
) has been added as a much lighter weight and therefore faster alternative to picard MergeBamAlignment
.
A best practices document has been drafted to show the recommended way to go from FASTQ files through to sorted and filtered consensus BAMs.
Major Changes
- Major performance improvements in
CallMolecularConsensusReads
andCallDuplexConsensusReads
by i) adding an optimized path for creating a "consensus" from a single read and ii) enabling efficient parallelization in #776 and #790 - New tool
ZipperBams
, which is a replacement for picard's MergeBamAlignment by @tfenne in #778.ZipperBams
handles any query-grouped BAM files and does not require sorting of the input or output. - Make
GroupReadsByUmi
more permissive in the alignments it accepts by @tfenne in #768. Starting with this releaseGroupReadsByUmi
will accept inter-chromosomal read-pairs by default, the--min-map-q
parameter has had its default changed from 30 to 1, and read-pairs with one mapped and one unmapped reads are also accepted. GroupReadsByUmi
can be run with no internal sorting if the input is already inTemplateCoordinate
order by @nh13 in #794. This can be achieved using eitherfgbio SortBam
or a template-coordinate sort in a forthcoming release ofsamtools
.- New tool
CallOverlappingConsensusBases
to consensus call overlapping bases in paired end reads. Adds direct support in the consensus calling tools (CallMolecularConsensusReads
andCallDuplexConsensusReads
) too. By @nh13 #805
Backward Incompatibilities
- Change default sort orders of consensus callers by @nh13 in #781. Now, by default, consensus callers will emit reads in the same order they are read in and perform no sorting. Sorting of the output is available, but is opt-in.
- Specify an output sort order in
FilterConsensusReads
by @nh13 in #782. PreviouslyFilterConsensusReads
would always sort its output intocoordinate
order. The new behaviour is to emit reads in the same order as the input, with sorting being opt-in via the--sort-order
option. - Require template sort orders in
ClipBam
andFilterConsensusReads
by @nh13 in #807. PreviouslyClipBam
andFilterConsensusReads
would sort their input if it was neither queryname sorted nor query-grouped. This behaviour was surprising to many users and led to extended runtimes. The tools now require the input BAM be either queryname-sorted of query-grouped and will fail fast if they are not. Output sorting is still available, but the default is to emit reads in the same order as the input. - Both
ClipBam
andFilterConsensusReads
require the reference to be full loaded into memory, versus previously iterating contig-by-contig by @nh13 in #807. This is required as both tools modify the bases and alignment and so need to update the NM/UQ/MD SAM tags (e.g. NM/UQ/MD).ClipBam
also needs to update mate information (SAM flag) depending on if reads are fully clipped. Therefore the JVM heap size may need to be increased to fit the full reference in memory (e.g.-Xmx8g
for a human genome).
Minor Changes
- Add a tool to copy the UMI from the read name by @nh13 in #775
- Add the --annotate-all option to
AssignPrimers
by @nh13 in #669 - Added ability for
FastqToBam
to also extract UMIs from read names. by @tfenne in #800 - Bugfix for "ConsensusCallingIterator could fail when no consensus reads are called" by @tfenne in #780
- Change default validation stringency to SILENT and make common option… by @tfenne in #793
- Do not return zero-length alignments by @nh13 in #552
- More ergonomic methods for converting between HTSJDK and fgbio
SequenceDictionary
objects by @tfenne in #767 - Reduce memory usage by
GroupReadsByUmi
in a corner case by @tfenne in #774 - Support for clipping reads that extend past their mate by @nh13 in #761
- Updates version of snappy to support Apple Silicon by @tfenne in #772
- Fixes a bug where
VcfWriter
was not writing VCF index files by @clintval #816 - Improved documentation of
LogProbability
methods by @wmchad #817 - Make SamWriter stop checking sort order when emitting pre-sorted records by @tfenne #820
Full Changelog: 1.5.1...2.0.0
v2.0.0-beta1
Overview
This is the first beta for the fgbio 2.0.0 release. A lot has changed in this release, including a significant number of backward incompatible changes to tools. This release is not being pushed to maven central (for use as a library) or to bioconda, and is only available as a download here, or by building from source.
A major theme of this release is performance of the UMI-related tools. The consensus callers now have options to parallelize using --threads
options as well as some internal optimizations. Sorting of data has been eliminated in many places (more on this below). And a new tool (ZipperBams
) has been added as a much lighter weight and therefore faster alternative to picard MergeBamAlignment
.
A best practices document has been drafted to show the recommended way to go from FASTQ files through to sorted and filtered consensus BAMs.
Major Changes
- Major performance improvements in
CallMolecularConsensusReads
andCallDuplexConsensusReads
by i) adding an optimized path for creating a "consensus" from a single read and ii) enabling efficient parallelization in #776 and #790 - New tool
ZipperBams
, which is a replacement for picard's MergeBamAlignment by @tfenne in #778.ZipperBams
handles any query-grouped BAM files and does not require sorting of the input or output. - Make GroupReadsByUmi more permissive in the alignments it accepts by @tfenne in #768. Starting with this release
GroupReadsByUmi
will accept inter-chromosomal read-pairs by default, the--min-map-q
parameter has had its default changed from 30 to 1, and read-pairs with one mapped and one unmapped reads are also accepted. - GroupReadsByUmi can be run with no internal sorting if the input is already in
TemplateCoordinate
order by @nh13 in #794. This can be acheived using eitherfgbio SortBam
or a template-coordinate sort in a forthcoming release ofsamtools
.
Backward Incompatibilities
- Change default sort orders of consensus callers by @nh13 in #781. Now, by default, consensus callers will emit reads in the same order they are read in and perform no sorting. Sorting of the output is available, but is opt-in.
- Specify an output sort order in
FilterConsensusReads
by @nh13 in #782. PreviouslyFilterConsensusReads
would always sort its output intocoordinate
order. The new behaviour is to emit reads in the same order as the input, with sorting being opt-in via the--sort-order
option. - Require template sort orders in ClipBam and FilterConsensusReads by @nh13 in #807. Previously
ClipBam
andFilterConsensusReads
would sort their input if it was neither queryname sorted nor query-grouped. This behaviour was surprising to many users and led to extended runtimes. The tools now require the input BAM be either queryname-sorted of query-grouped and will fail fast if they are not. Output sorting is still available, but the default is to emit reads in the same order as the input. - Both ClipBam and FilterConsensusReads require the reference to be full loaded into memory, versus previously iterating contig-by-contig by @nh13 in #807. This is required as both tools modify the bases and alignment and so need to update the NM/UQ/MD SAM tags (e.g. NM/UQ/MD). ClipBam also needs to update mate information (SAM flag) depending on if reads are fully clipped. Therefore the JVM heap size may need to be increased to fit the full reference in memory (e.g.
-Xmx8g
for a human genome).
Minor Changes
- Add a tool to copy the UMI from the read name by @nh13 in #775
- Add the --annotate-all option to
AssignPrimers
by @nh13 in #669 - Added ability for
FastqToBam
to also extract UMIs from read names. by @tfenne in #800 - Bugfix for "ConsensusCallingIterator could fail when no consensus reads are called" by @tfenne in #780
- Change default validation stringency to SILENT and make common option… by @tfenne in #793
- Do not return zero-length alignments by @nh13 in #552
- More ergonomic methods for converting between HTSJDK and fgbio
SequenceDictionary
objects. by @tfenne in #767 - Reduce memory usage by
GroupReadsByUmi
in a corner case by @tfenne in #774 - Support for clipping reads that extend past their mate by @nh13 in #761
- Updates version of snappy to support Apple Silicon by @tfenne in #772
Full Changelog: 1.5.1...2.0.0-beta1
Release 1.5.1
Minor release.
New tools in this release:
- Added a
SortSequenceDictionary
tool to re-sort a sequence dictionary #769. This is useful for tools that perform contig renaming.
Updates to tools in this release:
- Speed up
FilterSomaticVcf
by using a fast coordinate streaming pileup builder #763
Updates to the docs:
- Improve the description of the number of values in command line args #766