Description
After some testing, we found a ~10x speedup by applying --READ_LENGTH 1000000
and --USE_FAST_ALGORITHM true
to CollectRawWgsMetrics
and the metrics were identical to the 1-05 decimal place compared to output without USE_FAST_ALGORITHM
and default READ_LENGTH
(150bp).
So we applied the same settings to an Intel-based CENTOS 7 machine, but saw no speedup at all.
Apple M-chip:
Configuration:
[Fri May 30 14:57:43 EDT 2025] Executing as ___________ on Mac OS X 15.4.1 aarch64; OpenJDK 64-Bit Server VM 22.0.1+8; Deflater: Jdk; Inflater: Jdk; Provider GCS is available; Picard version: Version:3.3.0
CollectRawWgsMetrics call:
CollectRawWgsMetrics --INPUT 22-284-01352B.bam --OUTPUT CollectRawWgsMetrics_output_txt.txt --INCLUDE_BQ_HISTOGRAM true --USE_FAST_ALGORITHM true --READ_LENGTH 1000000 --REFERENCE_SEQUENCE ./picard/GRCh38_full_analysis_set_plus_decoy_hla.fa --MINIMUM_MAPPING_QUALITY 0 --MINIMUM_BASE_QUALITY 3 --COVERAGE_CAP 100000 --LOCUS_ACCUMULATION_CAP 200000 --STOP_AFTER -1 --COUNT_UNPAIRED false --SAMPLE_SIZE 10000 --ALLELE_FRACTION 0.001 --ALLELE_FRACTION 0.005 --ALLELE_FRACTION 0.01 --ALLELE_FRACTION 0.02 --ALLELE_FRACTION 0.05 --ALLELE_FRACTION 0.1 --ALLELE_FRACTION 0.2 --ALLELE_FRACTION 0.3 --ALLELE_FRACTION 0.5 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Memory utilization during processing:
30-32GB
CENTOS Intel-chip
Configuration:
[Fri May 30 15:12:46 EDT 2025] Executing as ____________ on Linux 4.18.0-513.18.1.el8_9.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.8.1+1; Deflater: Jdk; Inflater: Jdk; Provider GCS is available; Picard version: 3.4.0
CollectRawWgsMetrics call:
CollectRawWgsMetrics --INPUT 22-284-01352B.bam --OUTPUT CollectRawWgsMetrics_output_txt.txt --INCLUDE_BQ_HISTOGRAM true --USE_FAST_ALGORITHM true --READ_LENGTH 1000000 --REFERENCE_SEQUENCE ./picard/GRCh38_full_analysis_set_plus_decoy_hla.fa --MINIMUM_MAPPING_QUALITY 0 --MINIMUM_BASE_QUALITY 3 --COVERAGE_CAP 100000 --LOCUS_ACCUMULATION_CAP 200000 --STOP_AFTER -1 --COUNT_UNPAIRED false --SAMPLE_SIZE 10000 --ALLELE_FRACTION 0.001 --ALLELE_FRACTION 0.005 --ALLELE_FRACTION 0.01 --ALLELE_FRACTION 0.02 --ALLELE_FRACTION 0.05 --ALLELE_FRACTION 0.1 --ALLELE_FRACTION 0.2 --ALLELE_FRACTION 0.3 --ALLELE_FRACTION 0.5 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false --TMP_DIR /scratch/moleculardiagnosticlab/tmp
(NOTE: the CENTOS machine is a part of a HPC cluster so we use the SSD scratch space as the TMP_DIR).
Memory utilization during processing:
31.22 GB (max)
Thoughts:
The obvious answer could be "Wow those M-chips really are fast!" Not sure that explains the difference though, since using default params and no USE_FAST_ALGORITHM
revert back to the long processing time on the M-chip as well.