Skip to content

CollectRawWgsMetrics runs much slower on non-Apple chip machine #2013

Open
@CNHAustin

Description

@CNHAustin

After some testing, we found a ~10x speedup by applying --READ_LENGTH 1000000 and --USE_FAST_ALGORITHM true to CollectRawWgsMetrics and the metrics were identical to the 1-05 decimal place compared to output without USE_FAST_ALGORITHM and default READ_LENGTH (150bp).

So we applied the same settings to an Intel-based CENTOS 7 machine, but saw no speedup at all.

Apple M-chip:

Configuration:

[Fri May 30 14:57:43 EDT 2025] Executing as ___________ on Mac OS X 15.4.1 aarch64; OpenJDK 64-Bit Server VM 22.0.1+8; Deflater: Jdk; Inflater: Jdk; Provider GCS is available; Picard version: Version:3.3.0

CollectRawWgsMetrics call:

CollectRawWgsMetrics --INPUT 22-284-01352B.bam --OUTPUT CollectRawWgsMetrics_output_txt.txt --INCLUDE_BQ_HISTOGRAM true --USE_FAST_ALGORITHM true --READ_LENGTH 1000000 --REFERENCE_SEQUENCE ./picard/GRCh38_full_analysis_set_plus_decoy_hla.fa --MINIMUM_MAPPING_QUALITY 0 --MINIMUM_BASE_QUALITY 3 --COVERAGE_CAP 100000 --LOCUS_ACCUMULATION_CAP 200000 --STOP_AFTER -1 --COUNT_UNPAIRED false --SAMPLE_SIZE 10000 --ALLELE_FRACTION 0.001 --ALLELE_FRACTION 0.005 --ALLELE_FRACTION 0.01 --ALLELE_FRACTION 0.02 --ALLELE_FRACTION 0.05 --ALLELE_FRACTION 0.1 --ALLELE_FRACTION 0.2 --ALLELE_FRACTION 0.3 --ALLELE_FRACTION 0.5 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false

Memory utilization during processing:

30-32GB

CENTOS Intel-chip

Configuration:

[Fri May 30 15:12:46 EDT 2025] Executing as ____________ on Linux 4.18.0-513.18.1.el8_9.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.8.1+1; Deflater: Jdk; Inflater: Jdk; Provider GCS is available; Picard version: 3.4.0

CollectRawWgsMetrics call:

CollectRawWgsMetrics --INPUT 22-284-01352B.bam --OUTPUT CollectRawWgsMetrics_output_txt.txt --INCLUDE_BQ_HISTOGRAM true --USE_FAST_ALGORITHM true --READ_LENGTH 1000000 --REFERENCE_SEQUENCE ./picard/GRCh38_full_analysis_set_plus_decoy_hla.fa --MINIMUM_MAPPING_QUALITY 0 --MINIMUM_BASE_QUALITY 3 --COVERAGE_CAP 100000 --LOCUS_ACCUMULATION_CAP 200000 --STOP_AFTER -1 --COUNT_UNPAIRED false --SAMPLE_SIZE 10000 --ALLELE_FRACTION 0.001 --ALLELE_FRACTION 0.005 --ALLELE_FRACTION 0.01 --ALLELE_FRACTION 0.02 --ALLELE_FRACTION 0.05 --ALLELE_FRACTION 0.1 --ALLELE_FRACTION 0.2 --ALLELE_FRACTION 0.3 --ALLELE_FRACTION 0.5 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false --TMP_DIR /scratch/moleculardiagnosticlab/tmp

(NOTE: the CENTOS machine is a part of a HPC cluster so we use the SSD scratch space as the TMP_DIR).

Memory utilization during processing:

31.22 GB (max)

Thoughts:

The obvious answer could be "Wow those M-chips really are fast!" Not sure that explains the difference though, since using default params and no USE_FAST_ALGORITHM revert back to the long processing time on the M-chip as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions