PolyA tail in dorado Basecalled reads #1246

baibhav-bioinfo · 2025-02-08T19:02:27Z

Hello,
as we know the nanopore sequencing uses full length reads even with the polyA tails in it. So, the pod5 signal files contain the signals for the tail too.
I wanted to know what happens with the tail when the signals and basecalled into fastq reads (in default run).
As i can see the reads do not have a well defined and polyA tail in them. It looks random with no common pattern and also contains other bases in them specially at the end.
So, dorado tries to basecall the tail and fails due to difficulty to detect homopolymer regions or it tries to trim the tail while writing bases?

malton-ont · 2025-02-10T09:21:27Z

@baibhav-bioinfo,

Basecallers have issues with long homopolymers, as it is very hard to tell the difference between several bases of the same type or a stall on a single base. For this reason, dorado provides the --estimate-poly-a option, which performs a signal-based analysis of the polyA length and outputs this to the pt:i tag in the bam files. See here. Note that using this feature will not adjust the actual called sequence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PolyA tail in dorado Basecalled reads #1246

PolyA tail in dorado Basecalled reads #1246

baibhav-bioinfo commented Feb 8, 2025

malton-ont commented Feb 10, 2025

PolyA tail in dorado Basecalled reads #1246

PolyA tail in dorado Basecalled reads #1246

Comments

baibhav-bioinfo commented Feb 8, 2025

malton-ont commented Feb 10, 2025