Description
Bug Report
Affected tool(s)
TargetMetricsCollector, HsMetricCollector, WgsMetrics
Affected version(s)
Latest public release version [3.20]
Latest development/master branch as of [7/30/24]
Description
There are two potential bugs that I have noted, both of which revolve around the FOLD_80_BASE_PENALTY metric.
The FOLD_80_BASE_PENALTY metric is defined as:
However the calculation for the FOLD_80_BASE_PENALTY does not actually filter out the zero-coverage targets from the histogram which it uses for its getPercentile() calculation. You can see this when you look at the code for the metric:
picard/src/main/java/picard/analysis/directed/TargetMetricsCollector.java
Lines 728 to 762 in d8d87c9
Issue 1 -- Base Coverage Percentile Function:
In line 731, if an interval is found to have 0 coverage across all its bases, the histogram which is later used to calculate the 20th percentile has bin '0' incremented by the length of the interval: highQualityDepthHistogram.increment(0, c.interval.length())
. Thus, the highQualityDepthHistogram
includes zero-coverage target bases. Once all bases in the given intervals have been counted towards the highQualityDepthHistogram
, the histogram is then called in line 762 for the calculation of Fold80: metrics.FOLD_80_BASE_PENALTY = metrics.MEAN_TARGET_COVERAGE / highQualityDepthHistogram.getPercentile(0.2)
. As a result, the FOLD_80_BASE_PENALTY
metric is not calculating the fold over-coverage necessary to raise 80% of the non-zero coverage target bases to the mean coverage, but rather the fold over-coverage necessary to raise 80% of all target bases to the mean coverage.
Issue 2 -- Mean Coverage Calculation:
At line 762, FOLD_80_BASE_PENALTY
is defined to be metrics.MEAN_TARGET_COVERAGE / highQualityDepthHistogram.getPercentile(0.2)
. The metrics.MEAN_TARGET_COVERAGE
, which for this calculation should be the mean coverage of the non-zero-coverage target bases, is calculated in line 754: metrics.MEAN_TARGET_COVERAGE = (double) totalCoverage / metrics.TARGET_TERRITORY
. totalCoverage
is the sum of all the depths at each target base, while metrics.TARGET_TERRITORY
is defined at line 407: metrics.TARGET_TERRITORY = targetTerritory
. targetTerritory
, in turn, is defined in line 301 as: this.targetTerritory = Interval.countBases(uniqueTargets)
. In summary, the metrics.MEAN_TARGET_COVERAGE
is calculating the mean coverage of all target bases, while the FOLD_80_BASE_PENALTY
metric is defined as requiring the mean target coverage of non-zero-coverage target bases. Although the output MEAN_TARGET_COVERAGE
metric should independently be calculated including zero-coverage target bases, it should not be the metric that is used for the calculation of the FOLD_80_BASE_PENALTY
metric.
Steps to reproduce
Run CollectHsMetrics on any exome bam file that has any number of zero-coverage target bases. The output FOLD_80_BASE_PENALTY
will not match what you get if you were to manually calculate the FOLD_80_BASE_PENALTY
.
Minimal test case:
-
Download a sample exome .bam file, open it on IGV, find a small interval where some of the bases are at 0 coverage (I would recommend ~30 bases) .
-
Copy the sample's reference exome target and bait interval list files, delete all of their intervals and add in the ~30 base interval that you found.
-
Download the respective Fasta/Fai reference files.
-
Run the gatk 4.5.0.0 or 4.6.0.0 Docker Image
docker run -it broadinstitute/gatk:4.6.0.0
- Run CollectHsMetrics with given inputs:
gatk CollectHsMetrics -I <input/bam/path> -R <reference/fasta/path> -BI <modified/bait/intervals/path> -TI <modified/target/intervals/path> -O <output/file/path>
- Check the output
FOLD_80_BASE_PENALTY
value in the HsMetrics output file.
- For the example inputs, it should be: FLOAT
-
Calculate the
FOLD_80_BASE_PENALTY
by hand for the interval: -
Note that the two
FOLD_80_BASE_PENALTY
values are not equivalent.
Example for test case:
- Found the following 28 base interval:
Using chr7:142,560,458-142,560,485.
This is the depth profile over that region shown in IGV:

- Modifying the hg19 Target/Bait Interval Lists:
- Before Modification:

- After Modification:

- Running gatk CollectHsMetrics via terminal:

-
CollectHsMetrics output
FOLD_80_BASE_PENALTY
value:
FOLD_80_BASE_PENALTY = undefined
-
Calculated expected
FOLD_80_BASE_PENALTY
output value:
- Excluding Zero-Coverage-Target-Bases:

- Including Zero-Coverage-Target-Bases:

- Conclusion:
Since the twoFOLD_80_BASE_PENALTY
output values(1.136, undefined) are not equal, there must be an error in theFOLD_80_BASE_PENALTY
calculation regarding zero-coverage target bases.
Expected behavior
The metrics collector returns the FOLD_80_BASE_PENALTY value required to raise 80% of the bases in the non-zero coverage target region to the mean coverage of that non-zero coverage target region.
Actual behavior
The metrics collector returns the FOLD_80_BASE_PENALTY value required to raise 80% of the bases in the target region to the mean coverage of that target region.