Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PolyA tail in dorado Basecalled reads #1246

Open
baibhav-bioinfo opened this issue Feb 8, 2025 · 1 comment
Open

PolyA tail in dorado Basecalled reads #1246

baibhav-bioinfo opened this issue Feb 8, 2025 · 1 comment

Comments

@baibhav-bioinfo
Copy link

Hello,
as we know the nanopore sequencing uses full length reads even with the polyA tails in it. So, the pod5 signal files contain the signals for the tail too.
I wanted to know what happens with the tail when the signals and basecalled into fastq reads (in default run).
As i can see the reads do not have a well defined and polyA tail in them. It looks random with no common pattern and also contains other bases in them specially at the end.
So, dorado tries to basecall the tail and fails due to difficulty to detect homopolymer regions or it tries to trim the tail while writing bases?

@malton-ont
Copy link
Collaborator

@baibhav-bioinfo,

Basecallers have issues with long homopolymers, as it is very hard to tell the difference between several bases of the same type or a stall on a single base. For this reason, dorado provides the --estimate-poly-a option, which performs a signal-based analysis of the polyA length and outputs this to the pt:i tag in the bam files. See here. Note that using this feature will not adjust the actual called sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants