Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dorado v0.9.0 does not remove adapter sequences from Ultra-long data #1219

Open
is01 opened this issue Jan 16, 2025 · 8 comments
Open

dorado v0.9.0 does not remove adapter sequences from Ultra-long data #1219

is01 opened this issue Jan 16, 2025 · 8 comments
Labels
bug Something isn't working trim Issues related to adapter/primer trimming

Comments

@is01
Copy link

is01 commented Jan 16, 2025

Hello,
I basecalled ultra-long data with dorado v0.9.0, it seems that ~90 bases of adapter are left.
The adapter trimming worked in v0.8.2.
Is there an additional option required to trim the adapter sequence of the ultra long kit?

Run environment:

  • Dorado version: v0.9.0
  • Model: [email protected]
  • Dorado command: dorado basecaller -r --reference $fasta --modified-bases 5mC_5hmC $model $pod5 > out.bam
@malton-ont
Copy link
Collaborator

Hi @is01,

Thanks for reporting this. dorado 0.9.0 is supposed to select the appropriate adapters and primers to trim based on the sequencing kit detected from the pod5 file. It looks like the ULK kits were missed in the look-up - we'll get a fix out for this asap.

@malton-ont malton-ont added bug Something isn't working trim Issues related to adapter/primer trimming labels Jan 16, 2025
@malton-ont
Copy link
Collaborator

Hi @is01,

This issue should be resolved in dorado 0.9.1. If it persists, please re-open this ticket.

@is01
Copy link
Author

is01 commented Jan 21, 2025

@malton-ont Thank you for your quick response. I will try new version.
thanks.

@is01
Copy link
Author

is01 commented Jan 22, 2025

I basecalled the ultra-long data with dorado 0.9.1, but the reads still have adapters/primers.
It seems that the RAD primers (transposase sequence) were not removed.
Thanks for help.

@malton-ont malton-ont reopened this Jan 23, 2025
@malton-ont
Copy link
Collaborator

malton-ont commented Jan 23, 2025

@is01,

Are you able to share a pod5 file containing a few example reads?

@is01
Copy link
Author

is01 commented Jan 27, 2025

@malton-ont,
I'm sorry I can't share the pod5 file, but I found that adapter trimming works fine by specifying a custom primer file.
I ran the following command (dorado 0.9.1):
dorado basecaller -r --reference $fasta --modified-bases 5mC_5hmC --primer-sequences primer.fa $model $pod5 > out.bam

Custom primer file (primer.fa):

>RAD_front type=primer kits=SQK-ULK114
GCTTGGGTGTTTAACCGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGCCGCTTCA

Is it possible that the above sequence is not searched for in the ULK kit?

@malton-ont
Copy link
Collaborator

malton-ont commented Jan 27, 2025

@is01,

Apologies! dorado 0.9.1 added this kit to the adapter search, but is still missing it for the primers. We'll address that in the next release. In the mean time, using the custom primer trimming as you have is the appropriate solution. The other option is to add a dorado trim step after the basecalling and tell it to use the RAD114 kit (which has the same primer)

dorado trim --sequencing-kit SQK-RAD114 out.bam > trimmed.bam

@is01
Copy link
Author

is01 commented Jan 28, 2025

@malton-ont,

Thanks for providing the solution. I look forward to the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working trim Issues related to adapter/primer trimming
Projects
None yet
Development

No branches or pull requests

2 participants