Description
Dear all,
I am testing out the MarkDuplicates (Picard version 3.3.0).
When I try with the provided test data it works fine.
Nevertheless, if I create a single end test file (attached - I use a .sam extension) no optical duplicate cluster is detected, and the output is as follows:
`11:12:55.818 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/usrname/miniconda3/envs/latest_picard_env/share/picard-3.3.0-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Mar 27 11:12:55 CET 2025] MarkDuplicates TAGGING_POLICY=All INPUT=[/path/to/bam/optical_dupes.sam] OUTPUT=/path/to/output/marked_duplicates.bam METRICS_FILE=/path/to/output/marked_dup_metrics.txt ASSUME_SORT_ORDER=coordinate READ_NAME_REGEX=(?:.:)?([0-9]+)[^:]:(\d*):([0-9]+):.* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false CLEAR_DT=true DUPLEX_UMI=false FLOW_MODE=false FLOW_DUP_STRATEGY=FLOW_QUALITY_SUM_STRATEGY FLOW_USE_END_IN_UNPAIRED_READS=false FLOW_USE_UNPAIRED_CLIPPED_END=false FLOW_UNPAIRED_END_UNCERTAINTY=0 FLOW_UNPAIRED_START_UNCERTAINTY=0 FLOW_SKIP_FIRST_N_FLOWS=0 FLOW_Q_IS_KNOWN_END=false FLOW_EFFECTIVE_QUALITY_THRESHOLD=15 ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Mar 27 11:12:55 CET 2025] Executing as usr@testsrv on Linux 5.15.0-84-generic amd64; OpenJDK 64-Bit Server VM 21.0.6+9-b895.97; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: 3.3.0
INFO 2025-03-27 11:12:56 MarkDuplicates Start of doWork freeMemory: 324136856; totalMemory: 335544320; maxMemory: 31675383808
INFO 2025-03-27 11:12:56 MarkDuplicates Reading input file and constructing read end information.
INFO 2025-03-27 11:12:56 MarkDuplicates Will retain up to 114765883 data points before spilling to disk.
INFO 2025-03-27 11:12:56 MarkDuplicates Read 4 records. 0 pairs never matched.
INFO 2025-03-27 11:12:56 MarkDuplicates After buildSortedReadEndLists freeMemory: 844555976; totalMemory: 1795162112; maxMemory: 31675383808
INFO 2025-03-27 11:12:58 MarkDuplicates Will retain up to 494927872 duplicate indices before spilling to disk.
INFO 2025-03-27 11:13:01 MarkDuplicates Traversing read pair information and detecting duplicates.
INFO 2025-03-27 11:13:01 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2025-03-27 11:13:01 MarkDuplicates Sorting list of duplicate records.
INFO 2025-03-27 11:13:01 MarkDuplicates After generateDuplicateIndexes freeMemory: 5525398280; totalMemory: 13488881664; maxMemory: 31675383808
INFO 2025-03-27 11:13:01 MarkDuplicates Marking 3 records as duplicates.
INFO 2025-03-27 11:13:01 MarkDuplicates Found 0 optical duplicate clusters.
INFO 2025-03-27 11:13:01 MarkDuplicates Reads are assumed to be ordered by: coordinate
INFO 2025-03-27 11:13:01 MarkDuplicates Writing complete. Closing input iterator.
INFO 2025-03-27 11:13:01 MarkDuplicates Duplicate Index cleanup.
INFO 2025-03-27 11:13:01 MarkDuplicates Getting Memory Stats.
INFO 2025-03-27 11:13:01 MarkDuplicates Before output close freeMemory: 9494083328; totalMemory: 13488881664; maxMemory: 31675383808
INFO 2025-03-27 11:13:01 MarkDuplicates Closed outputs. Getting more Memory Stats.
INFO 2025-03-27 11:13:01 MarkDuplicates After output close freeMemory: 9494084392; totalMemory: 13488881664; maxMemory: 31675383808
[Thu Mar 27 11:13:01 CET 2025] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=13488881664`
- Would you maybe have an explanation why the reads in my example are not detected as an optical cluster even if they are reasonably close?
- is there a parameter to specify that the reads are SE that I am overlooking?
- Does the code behave differently for PE and SE reads?
Any help would be appreciated.
Thank you in advance.