Skip to content

Duplicate reads in BAM #30

@JenniferShelton

Description

@JenniferShelton

Hi,

This tool is very useful in my workflow but I have a question about the preprocessing step.

The chr6 coordinates overlap causing the final alignment to have duplicate primary alignments for the same read. The number of primary alignments matches the number of times the filter step used "samtools view chr6:START1-END1...chr6:STARTn-ENDn" .

For example, if I use this process to filter reads it duplicats reads in the first samtools view and then duplicats them a second time for the second samtools extraction. These four reads then have primary alignments to different contigs in the panel (each of which has a flag of 83 or 163)
https://github.com/Kingsford-Group/kourami/blob/master/scripts/alignAndExtract_hs38DH.sh

A00117:192:HFGFGDMXX:1:1339:16486:16830	83	DPB1*266:01	5085	0	76M	=	5042	-119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	163	DPB1*266:01	5042	0	76M	=	5085	119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	83	DPB1*379:01	5085	0	76M	=	5042	-119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	163	DPB1*379:01	5042	0	76M	=	5085	119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	83	DPB1*351:01	5085	0	76M	=	5042	-119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	163	DPB1*351:01	5042	0	76M	=	5085	119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	83	DPB1*20:01:02	5085	0	76M	=	5042	-119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76
A00117:192:HFGFGDMXX:1:1339:16486:16830	163	DPB1*20:01:02	5042	0	76M	=	5085	119		NM:i:0	MD:Z:76	AS:i:76	XS:i:76

The number of alignments is arbitrarily large depending on how many times the read was included in the FASTQ. They can also align multiple times to the same contig. Should I actually be creating a BAM with all possible alignments or with one primary alignment only?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions