Skip to content

Releases: fulcrumgenomics/fgbio

Release 0.6.0

05 Apr 18:56
Compare
Choose a tag to compare

Release 0.6.0 introduces the following changes to existing tools:

  • ReviewConsensusVariants: output PASS when there are no filters on the variant; fix format of bases output
  • MaskPrimers: improved usage documentation to make primer file format clearer

The following API changes were also introduced:

  • Added constants to SamRecord for SAM/BAM related constant values
  • NeedlemanWunchAligner renamed to Aligner (old name deprecated by still works)
    • Implemented Glocal (or semi-global) alignment mode
    • Impleemnted Local alignment mode
    • Fixed affine gap implementation
    • Fixed Alignment.subByQuery/subByTarget to correctly handle adjacent deletions
  • In metrics files, ensure 0.0 always formats as 0 and not 0E0
  • Updated how Rscript finds resources in the classpath to support local paths and absolute paths with and without leading slashes

Release 0.5.1

27 Feb 22:11
Compare
Choose a tag to compare

Release 0.5.1 is a minor bug-fix release and introduces the following changes:

  • ExtractUmisFromBam
    • Improved error messaging
    • Fixed bug that prevented it from working when only one read per pair contained a UMI
  • GroupReadsByUmi now adds the sub-sort SS tag to the header of BAMs produced
  • CallMolecularConsensusReads and CallDuplexConensusReads attempt to detect the sort order of input data and will fail if the sort order is incompatible
  • DemuxFastqs changed some output metrics from 32-bit Int to 64-bit Long to avoid overflows on NovaSeq data

Release 0.5.0

11 Feb 15:31
Compare
Choose a tag to compare

Release 0.5.0 introduces the following changes to existing tools:

  • CallDuplexConsensusReads: Fixed a rare bug where the consensus base quality could be zero or one if the two strands' base qualities differ by two or less.
  • FilterConsensusReads: Fix for bug where duplex reads formed from raw reads from a single strand only could be incorrectly filtered.
  • CorrectUmis: Now stores the original UMI sequences in the OX tag upon correction.
  • DemuxFastqs: Bug fix to correct quality scores in output BAM files
  • ClipOverlappingReads: Removed previously deprecated tool. Use ClipBam instead.
  • ClipBam:
    • Now optionally outputs metrics about clipping present in reads before and after execution.
    • New option to "upgrade" clipping, e.g. replace existing soft-clipping with hard-clipping

Changes to APIs were as follows:

  • Various deprecated methods were removed this release.
  • Metric formatting now prints smaller Doubles in scientific notation, and the formatting is generally more efficient.
  • NeedlemanWunchAligner gained a Glocal alignment mode for aligning all of a query sequence to a sub-region of a target sequence

Release 0.4.0

15 Nov 17:41
a9445b4
Compare
Choose a tag to compare

Release 0.4.0 introduces the following changes to existing tools:

  • CallDuplexConsensusReads
    • The single strand consensus bases and quals for each duplex consensus read are output into tags on the duplex consensus read
    • Added option to output consensus reads that are formed from only a single strand
  • FilterConsensusReads
    • New option to filter out reads with low mean base quality
    • New option to filter out reads whose minimum depth is too low
    • New option to filter duplex consensus reads where the single strand consensuses disagree
    • New optional tags will store the the single-strand consensus bases and qualities for duplex consensus reads.
  • DemuxFastqs
    • will no longer output /1 and /2 on read names when running in Illumina standards mode
    • fixed a bug causing an exception when the sample barcode is found in multiple reads (ex. i5 and i7)
  • ErrorRateByReadPosition - fixed bug that resulted in C>G errors being counted as A>G errors
  • GroupReadsByUmi
    • Reads with UMIs with Ns in them are now rejected
    • Log messages added with counts of reads filtered out by reason
    • Memory usage improvements when grouping reads at very, very high depth.
    • Supports enforcing a minimum UMI length and partial UMIs except for the paired strategy (duplex sequencing).

Finally, changes to various APIs were as follows:

  • Method in Bams to sort records by tag, or by a function applied to a tag
  • Improve speed of Metric.read for loading large numbers of rows from metrics files
  • Changed SamSource to extend IterableView instead of Iterable so that map(), filter(), etc. return lazy views
  • Fixed a bug where the specified temporary directory was not being used for sorting.
  • Added a BinomialDistribution class implemented using unlimited precision decimal math which is slower, but allows computation of cumulative probabilities where other implementations overflow or underflow

Release 0.3.0

05 Oct 19:22
Compare
Choose a tag to compare

Release 0.3.0 introduces the following changes to existing tools:

  • ClipBam - The --overlapping-reads option was not being used internally and is deprecated in favor of --clip-overlapping-reads. This caused overlapping reads to always be clipped.
  • CollectDuplexSeqMetrics - Added the optional output of duplex-umi frequencies with DuplexUmiMetrics.
  • DemuxFastqs - The default output sort order is changed from Unsorted to Queryname. Add an option --illumina-standards to output file names using Illumina naming conventions. Tuned the amount of memory used, especially for a large # of samples (>96).
  • CallDuplexConsensusReads - Do not except when we find potential collisions in duplex molecules, instead, do not generate a consensus read.
  • FilterBam - adding a few more filters.
  • Added a global parameter for log-level.

In addition, the following new tools were added:

  • CollectErccMetrics - This will collect metrics for analyzing ERCC spike-ins in
    RNA-Seq experiments for dose response but not fold-change
    response.

Finally, changes to various APIs were as follows:

  • ReferenceSetBuilder - Moved to the testing packages for use in projects that extend fgbio.
  • Alignment - Added subByQuery() and subByTarget() methods to Alignment.

Release 0.2.0

22 Jun 21:11
Compare
Choose a tag to compare

Release 0.2.0 introduces the following changes to existing tools:

  • added global arguments accessible to all tools, which are given as arguments prior to the tool name:
    • --tmp-dir: directory to use for temporary files.
    • --compression: default GZIP compression level, BAM compression level.
    • --async-io: use asynchronous I/O where possible, e.g. for SAM and BAM files.
  • numerous changes to the tool documentation to support output in MarkDown format.
  • DuplexConsensusCaller:
    • adding logging statistics for DuplexConsensusCaller.
    • adding quality trimming.
    • improved method to find the set of "compatible" cigars to filter which reads from which to build a consensus
  • DemuxFastqs:
    • the output directory should be created if it does not exist
    • change to the new quality format detector caused the detected encoding
      not to be printed
  • ClipOverlappingReads is deprecated in favor of ClipBam.
  • SampleSheet and ExtractBasecallingParamsForPicard
    • if the library identifier (Library_Id column) does not exist, it will default to the sample identifier (Sample_d column); previously it defaulted to the sample name (Sample_Name column).
  • HapCutToVcf: updated to support updated HapCut2 outputs.
    • the full FORMAT field in the VCF is printed, including trailing missing values.

In addition, the following new tools were added:

  • FastqToBam: generates an unmapped BAM (or SAM or CRAM) file from fastq files.
  • BuildToolDocs: generates the suite of per-tool MarkDown documents.
  • SplitBam: splits a BAM into multiple BAMs, one per-read group (or library).
  • ClipBam: clips reads from the same template; replaces ClipOverlappingReads.
  • CollectDuplexSeqMetrics: generates metrics for duplex sequencing quality control.

Next, a new API for reading and writing SAM/BAM files built for scala idioms:

  • SamRecord: a replacement for htsjdk's SAMRecord with more scala-esque fields and methods.
  • SamSource: a class for reading SAM/BAM/CRAM files and for querying them.
  • SamWriter: a class for writing SAM/BAM/CRAM files and sorting them.
  • SamOrder: a trait for specifying SAM/BAM orderings; in addition to coordinate and queryname sort orders, includes useful and novel sorts such as:
    • random: generates a random order over all the reads.
    • randomquery: generates a random order with queryname grouping.
    • templatecoordinate: the sort order used by GroupReadByUmi; sorts reads by the earlier unclipped 5' coordinate of the read pair, followed by the higher unclipped 5' coordinate of the read pair.
    • unsorted: the official "unsorted" ordering.
    • unknown: he official "unknown" ordering.
  • Bams: methods for manipulating sequences of SamRecords and other useful utility methods.
    • contains sorting methods that have better disk-backed sorting than htsjdk's for faster sorting of SAM/BAM files.
  • SamBuilder: a class for building SAM/BAM files and records; useful for generating test-cases for unit tests.

Finally the following other changes were made:

  • support for scala 2.12.2; we use this version by default.
  • support for cross-building and publishing of scala 2.11.8 and 2.12.2
  • uses 0.2.0 release of sopt and commons.

Release 0.1.4

08 May 16:14
Compare
Choose a tag to compare

Release 0.1.4 introduces the following changes to existing tools:

  • CallMolecularConsensusReads
    • Added the ability to filter the maximum number of reads going into a consensus read
  • CallMolecularConsensusReads and FilterConsensusReads
    • No longer have default values for their --min-reads and --min-consensus-base-quality/--min-base-quality parameters. The correct values for these parameters is highly library/coverage dependent and is best set by the user.
  • CallMolecularConsensusReads and CallDuplexConsensusReads
    • Raw reads are end-trimmed for Ns after low-quality masking, prior to consensus calling
    • Raw reads that are FR pairs with read length > insert size are trimmed to the insert size prior to consensus calling
  • ErrorRateByReadPosition
    • Fixed a bug whereby the cumulative error plot produced in the PDF incorrectly started the R2 error count at the cumulative sum of the R1 error count.
    • Added the count of errors (in addition to error rate) to the output file
  • FilterSomaticVcf
    • Now gracefully handles reads who's insert size and mapping information disagree. Warnings will be logged for all such reads, but the tool will not stop/exit upon finding such reads. Should reduce the frequency of "genomicPosition is outside of template" error messages
    • Works with VCFs that do not contain #contig lines in the header

In addition the following new tools were added:

  • DemuxFastqs: Performs sample demultiplexing on FASTQs
  • CorrectUmis: Corrects UMI sequences in BAM files when a set of fixed UMIs (not randommers) are used

Miscellaneous:

  • Added support for cross-building scala 2.11 and 2.12
  • Tools that invoke R scripts will now produce less noisy output

Release 0.1.3

22 Feb 02:58
Compare
Choose a tag to compare

Release 0.1.3 introduces the following changes to existing tools:

  • CallMolecularConsensusReads now produces detailed information about consensus reads in new optional tags
  • MakeTwoSampleMixtureVcf now propogates the ID field from the source VCF into the mixutre VCF
  • ErrorRateByReadPosition now masks out known variants, provides per-substitution type error rates and produces summary plots
  • ReviewConsensusVariants now generates a detailed output file with a row per variant-supporting-read

In addition the following new tools were added:

  • ClipOverlappingReads: clips alignments from read pairs whose alignments overlap
  • FilterConsensusReads: filters consensus reads generated by CallMolecularConsensusReads
  • EstimatePoolingFractions: estimates the fractional contribution of individual samples with known genotypes to a pooled sample
  • EstimateRnaSeqInsertSize: estimates insert size distributions of RNA sequencing experiments in the presence of splicing
  • CallDuplexConsensusReads: generates consensus reads from duplex-sequencing protocols that embed a UMI at the start of each read in a pair
  • MakeMixtureVcf: generates a VCF for a mixture sample created from many individual samples
  • FilterSomaticVcf: applies filters to VCFs of somatic variants
  • RemoveSamTags: strips out optional tags/attributes from a SAM/BAM file to reduce size
  • ExtractBasecallingParamsForPicard: parses an Illumina Experiment Manager sample sheet and generates the files needed to run Picard's basecalling tools
  • ExtractIlluminaRunInfo: extracts information from Illumina's RunInfo.xml file into a simple tab-delimited table

fgbio release version 0.1.2

07 Jan 20:39
Compare
Choose a tag to compare

Release of fgbio that contains tools:

  1. ErrorRateByReadPosition: Calculates the error rate by read position on mapped BAMs.
  2. ReviewConsensusVariants: Extracts data to make reviewing of variant calls from consensus reads easier.
  3. PickIlluminaIndices: Picks a set of molecular indices that should work well together.
  4. AssessPhasing: Assess the accuracy of phasing for a set of variants.
  5. AutoGenerateReadGroupsByName: Adds read groups to a BAM file for a single sample by parsing the read names.
  6. MakeTwoSampleMixtureVcf: Tool to make a VCF with genotypes constructed by mixing the genotypes of two other samples.

Numerous bug fixes, performance improvements, and changes have been made to existing tools and classes. Refer to the commit history for such changes.

fgbio release version 0.1.1

18 Jul 17:29
Compare
Choose a tag to compare

Release of fgbio that contains tools:

  1. HardMaskFasta: Converts soft-masked sequence to hard-masked in a FASTA file.
  2. TrimFastq: Trims reads in one or more line-matched fastq files to a specific read length.
  3. ExtractUmisFromBam: Extracts unique molecular indexes from reads in a BAM file into tags.
  4. FindTechnicalReads: Find reads that are from technical or synthetic sequences in a BAM file.
  5. RandomizeBam: Randomizes the order of reads in a SAM or BAM file.
  6. SetMateInformation: Adds and/or fixes mate information on paired-end reads.
  7. UpdateReadGroups: Updates one or more read groups and their identifiers.
  8. CallMolecularConsensusReads Calls consensus sequences from reads with the same unique molecular tag.
  9. GroupReadsByUmi: Groups reads together that appear to have come from the same original molecule.
  10. HapCutToVcf: Converts the output of HapCut to a VCF.