Releases · broadinstitute/gatk

26 Jul 15:45

droazen

4.beta.3

b49f1c4

4.beta.3 Pre-release

Pre-release

This release contains a number of bug fixes and improvements. Highlights include a fix for intermittent failures/timeouts when accessing data in Google Cloud Storage (GCS), new and improved active-region detection for Mutect2, and a new VariantRecalibrator argument to allow the tool to scale better. See the full list of changes below. Most of the major known issues listed in the release notes for 4.beta.1 still apply, with the exception of the "intermittent GCS failures/timeouts" issue, which is now resolved.

A docker image for this release can be found in the broadinstitute/gatk repository on dockerhub. Within the image, cd into /gatk then run gatk-launch commands as usual.

Note: Due to our current dependency on a snapshot of google-cloud-java, this release cannot be published to maven central.

Changes in this release:

GATK engine: Move to google-cloud-java snapshot with more robust retries, and set number of retries/reopens globally. This fixes the intermittent "all retries/reopens failed" error when accessing data on GCS (Google Cloud Storage). See issue #2749
Mutect2: Implemented a new algorithm for active-region detection, reducing spurious active regions by almost 50%
Mutect2: Filter artifacts that arise from apparent-duplicate reads
Mutect2 WDL: Oncotator is now being told the case and control sample names explicitly in the WDL. The Oncotator code for inferring this could yield incorrect answers in some cases. See issue #3343
FilterByOrientationBias: We discovered that it is impossible to guarantee a FDR threshold of all the variants when one artifact mode had high oxoQ and the other had low. We have changed the tool to guarantee the FDR threshold within each artifact mode, rather than for all variants. For more details, see issue #3344
FilterByOrientationBias: Summary table was not being populated properly. That has been fixed. See issue #3309
VariantRecalibrator: Add argument to pre-sample data for VQSR model building (and also recalibration) to reduce memory usage for production pipeline. See issue #3230
Fix a stack overflow issue at high depths in the strand artifact annotation. See issue #3317
GenomicsDBImport: add --readerThreads argument for multi-threaded vcf pre-loading. Improves performance of the tool by ~30% in our tests.
ValidateVariants: port gvcf validation option from GATK3
Polish up PathSeq and add pipeline tool
Fix error message describing how to set the GATK_STACKTRACE_ON_USER_EXCEPTION property
Mutect2FilteringEngine: correct MEDIAN_BASE_QUALITY_DIFFERENCE_FILTER and MEDIAN_MAPPING_QUALITY_DIFFERENCE_FILTER filter names
Mutect2 WDL: gave ProcessOptionalArguments a leaner docker
GATK4 Docker Image: changed the landing directory for the docker image to be /gatk instead of /root
Travis CI: fixed test report not being uploaded to GCS
Travis CI: removed non-docker unit and integration tests, which were redundant

Assets 3

20 Jul 13:32

droazen

4.beta.2

a21447f

4.beta.2 Pre-release

Pre-release

This is a bug fix release primarily aimed at fixing some issues in the Mutect2 WDL. The major known issues listed in the release notes for 4.beta.1 still apply.

A docker image for this release can be found in the broadinstitute/gatk repository on dockerhub. Within the image, cd into /gatk then run gatk-launch commands as usual.

Changes in this release:

Mutect2 WDL: corrected the ordering of FilterMutectCalls relative to FilterByOrientationBias. FilterByOrientationBias should always be run after all other filters, since (by design) it is trying to keep a bound on the FDR rate. See issue #3288
Mutect2 WDL: added automated extraction of bam sample names from the input bam files, using samtools. This should be viewed as a temporary fix until named parameters are in place. See issue #3265
FilterByOrientationBias: fixed to no longer throw IllegalStateExceptions when running on a large number of variants. This was due to a hashing collision in a sorted map. See issue #3291.
FilterByOrientationBias: non-diploid warnings have been set to debug severity. This should reduce the stdout. As a side-effect, this should address/attenuate a comment in issue #3291.
VcfToIntervalList: added ability to generate interval list on all variants, not just the ones that passed filtering. Please note that this change may need to be ported to Picard. Added an automated test that should fail if this mechanism is broken in the GATK. See PR #3250
CollectAllelicCounts: now inherits from LocusWalker, rather than custom traversal. This reduced the amount of code. See issue #2968 (and PR #3203 for some other changes)
Added experimental (and unsupported) tool CalculatePulldownPhasePosteriors at a user request. See issue #3296
Implement PathSeqScoreSpark and PathSeqBwaSpark tools, and update PathSeqFilterSpark and PathSeqBuildKmers tools
Many changes to Mutect2 Hapmap validation WDL
GatherVcfs: support block copy mode with GCS inputs
GatherVcfs: fix crash when gathering files with no variants
AlleleSubsettingUtils: if null likelihoods, don't add to likelihoods sums (fixes #3210)
SV tools: add small indel evidence
SV tools: several FASTQ-related fixes (#3131, #2754, #3214)
SV tools: always use upstream read when looking at template lengths
SV tools: fix bugs in the SV pipeline's cross-contig ignore logic regarding non-primary contigs
SV tools: switch to dataproc image 1.1 in create_cluster.sh
SV tools: FindBreakEvidenceSpark can now produce a coordinate sorted Assemblies bam
Bait count bias correction for TargetCoverageSexGenotyper
CountFalsePositives: fix so it a) does not return garbage for target territory and b) returns a proper fraction for false positive rate
Specify UTF-8 encoding in implementations of GATKRead.getAttributeAsByteArray()
GATK engine: fix sort order when reading multiple bams
Fix GATKSAMRecordToGATKReadAdapter.getAttributeAsString() for byte[] attributes
Fix various issues that were causing Travis CI test suite runs to fail intermittently

Assets 3

28 Jun 19:07

droazen

4.beta.1

c43f93e

4.beta.1 Pre-release

Pre-release

This release brings together most of the tools we intend to include in the final GATK 4.0 release. Some tools are stable and ready for production use, while others are still in a beta or experimental stage of development. You can see which tools are marked as beta/experimental by running gatk-launch --list

A docker image for this release can be found in the broadinstitute/gatk repository on dockerhub. Within the image, cd into /gatk then run gatk-launch commands as usual.

Major Known Issues

GCS (Google Cloud Storage) inputs/outputs are only supported by a subset of the tools. For the 4.0 general release, we intend to extend support to all tools.
- In particular, GCS support in most of the Spark tools is currently very limited when not running on Google Cloud Dataproc.
- Writing BAMs to a GCS bucket on Spark is broken in some tools due to #2793
HaplotypeCaller and HaplotypeCallerSpark are still in development and not ready for production use. Their output does not currently match the output of the GATK3 version of the tool in all respects.
Picard tools bundled with the GATK are currently based off of an older release of Picard. For the 4.0 general release we plan to update to the latest version.
CRAM reading can fail with an MD5 mismatch when the reference or reads contain ambiguity codes (#3154)
The IndexFeatureFile tool is currently disabled due to serious Tabix-index-related bugs in htsjdk (#2801)
The GenomicsDBImport tool (the GATK4 replacement for CombineGVCFs) experiences transient GCS failures/timeouts when run at massive scale (#2685)
CNV workflows have been evaluated for use on whole-exome sequencing data, but evaluations for use on whole-genome sequencing data are ongoing. Additional tuning of various parameters (for example, those for PerformSegmentation or AllelicCNV in the somatic workflow) may improve performance or decrease runtime on WGS.
Creation of a panel of normals with GermlineCNVCaller typically requires a Spark cluster.
The SV tools pipeline is under active development and is missing many major features which are planned for its public release. The current pipeline produces deletion, insertion, and inversion calls for a single sample based on local assembly of breakpoints. Known issues and missing features include but are not limited to:
- Inversions and breakpoints due to complex events are not properly filtered and annotated in some cases. Some inversion calls produced by the pipeline are due to uncharacterized complex events such as inverted and dispersed duplications. We plan to implement an overhauled, more complete detection system for complex SVs in future releases.
- The SV pipeline does not incorporate read depth based information. We plan to provide integration with read-depth based detection methods in the future, which will increase the number of variants detectable, and assist in the characterization of complex SVs.
- The SV pipeline does not yet genotype variants or provide genotype likelihoods.
- The SV pipeline has only been tested on Spark clusters with a limited set of configurations in Google Cloud Dataproc. We have provided scripts in the test directory for creating and running the pipeline. Running in other configurations may cause problems.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Major Known Issues

Uh oh!

Releases: broadinstitute/gatk

4.beta.3

Uh oh!

4.beta.2

Uh oh!

4.beta.1

Major Known Issues

Uh oh!