Calling Duplex Consensus Reads

Overview

Calling consensus reads from duplex data is the process of taking reads that were generated in a way such as to allow post-hoc identification of which sequencing reads are derived from the paired strands of an original duplex or double-stranded source molecule of DNA. One such example is the process outlined by Kennedy et al which attaches UMIs to each end of a source molecule.

Mathematically the process is very similar to the one outlined for calling consensus reads from single-UMI data, though the mechanics are somewhat different.

The process outlined below is implemented in the CallDuplexConsensusReads program in fgbio, which is run after first grouping reads with GroupReadsByUmi --strategy=paired.

Process

The high level process starts with a group of reads identified as originating from the same double-stranded source molecule. The two strands of the original molecule are labeled, arbitrarily, as A and B and each read is known to have originated from either the A strand or the B strand. The process proceeds through the following steps:

Reads are split into four sub-groups:
- Strand A and read 1 (A1s)
- Strand A and read 2 (A2s)
- Strand B and read 1 (B1s)
- Strand B and read 2 (B2s)
Reads are unmapped and, if necessary, reverted to sequencing order
Quality trimming, if enabled
Remaining low-quality bases are masked (i.e. converted to Ns)
Reads are further trimmed to the length of the insert if the insert is shorter than the read length
Reads are filtered based on their Cigar (alignment structure) to ensure reads are always in phase
Four single-strand consensus reads are generated, one each for A1s, A2s, B1s, and B2s
Two duplex consensus reads are generated by combining the A1 and B2 consensus reads, and the A2 and B2 consensus reads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Calling Duplex Consensus Reads

Overview

Process

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally