No optical clusters detected with single end reads

Dear all, 

I am testing out the MarkDuplicates (Picard version 3.3.0).
When I try with the provided [test data](https://github.com/broadinstitute/picard/blob/master/testdata/picard/sam/MarkDuplicates/optical_dupes.sam) it works fine. 
Nevertheless, if I create a single end test file (attached - I use a .sam extension) no optical duplicate cluster is detected, and the output is as follows:

`11:12:55.818 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/usrname/miniconda3/envs/latest_picard_env/share/picard-3.3.0-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so

[Thu Mar 27 11:12:55 CET 2025] MarkDuplicates TAGGING_POLICY=All INPUT=[/path/to/bam/optical_dupes.sam] OUTPUT=/path/to/output/marked_duplicates.bam METRICS_FILE=/path/to/output/marked_dup_metrics.txt ASSUME_SORT_ORDER=coordinate READ_NAME_REGEX=(?:.*:)?([0-9]+)[^:]*:(\d*):([0-9]+):.* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false CLEAR_DT=true DUPLEX_UMI=false FLOW_MODE=false FLOW_DUP_STRATEGY=FLOW_QUALITY_SUM_STRATEGY FLOW_USE_END_IN_UNPAIRED_READS=false FLOW_USE_UNPAIRED_CLIPPED_END=false FLOW_UNPAIRED_END_UNCERTAINTY=0 FLOW_UNPAIRED_START_UNCERTAINTY=0 FLOW_SKIP_FIRST_N_FLOWS=0 FLOW_Q_IS_KNOWN_END=false FLOW_EFFECTIVE_QUALITY_THRESHOLD=15 ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

[Thu Mar 27 11:12:55 CET 2025] Executing as usr@testsrv on Linux 5.15.0-84-generic amd64; OpenJDK 64-Bit Server VM 21.0.6+9-b895.97; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: 3.3.0
INFO    2025-03-27 11:12:56     MarkDuplicates  Start of doWork freeMemory: 324136856; totalMemory: 335544320; maxMemory: 31675383808
INFO    2025-03-27 11:12:56     MarkDuplicates  Reading input file and constructing read end information.
INFO    2025-03-27 11:12:56     MarkDuplicates  Will retain up to 114765883 data points before spilling to disk.
INFO    2025-03-27 11:12:56     MarkDuplicates  Read 4 records. 0 pairs never matched.
INFO    2025-03-27 11:12:56     MarkDuplicates  After buildSortedReadEndLists freeMemory: 844555976; totalMemory: 1795162112; maxMemory: 31675383808
INFO    2025-03-27 11:12:58     MarkDuplicates  Will retain up to 494927872 duplicate indices before spilling to disk.
INFO    2025-03-27 11:13:01     MarkDuplicates  Traversing read pair information and detecting duplicates.
INFO    2025-03-27 11:13:01     MarkDuplicates  Traversing fragment information and detecting duplicates.
INFO    2025-03-27 11:13:01     MarkDuplicates  Sorting list of duplicate records.
INFO    2025-03-27 11:13:01     MarkDuplicates  After generateDuplicateIndexes freeMemory: 5525398280; totalMemory: 13488881664; maxMemory: 31675383808
INFO    2025-03-27 11:13:01     MarkDuplicates  Marking 3 records as duplicates.
INFO    2025-03-27 11:13:01     MarkDuplicates  Found 0 optical duplicate clusters.
INFO    2025-03-27 11:13:01     MarkDuplicates  Reads are assumed to be ordered by: coordinate
INFO    2025-03-27 11:13:01     MarkDuplicates  Writing complete. Closing input iterator.
INFO    2025-03-27 11:13:01     MarkDuplicates  Duplicate Index cleanup.
INFO    2025-03-27 11:13:01     MarkDuplicates  Getting Memory Stats.
INFO    2025-03-27 11:13:01     MarkDuplicates  Before output close freeMemory: 9494083328; totalMemory: 13488881664; maxMemory: 31675383808
INFO    2025-03-27 11:13:01     MarkDuplicates  Closed outputs. Getting more Memory Stats.
INFO    2025-03-27 11:13:01     MarkDuplicates  After output close freeMemory: 9494084392; totalMemory: 13488881664; maxMemory: 31675383808

[Thu Mar 27 11:13:01 CET 2025] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=13488881664`


- Would you maybe have an explanation why the reads in my example are not detected as an optical cluster even if they are reasonably close?
- is there a parameter to specify that the reads are SE that I am overlooking? 
- Does the code behave differently for PE and SE reads? 

Any help would be appreciated. 
Thank you in advance.

[optical_dupes.txt](https://github.com/user-attachments/files/19484403/optical_dupes.txt)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No optical clusters detected with single end reads #2004

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

No optical clusters detected with single end reads #2004

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions