Plasmid poly(A) Disagreement Between 0.8.0 and 0.9.1 #1233

VBHerrenC · 2025-01-28T16:19:42Z

Issue Report

Please describe the issue:

We ran basecalling on an SQK-RBK114-24 plasmid dataset with --estimate-poly-a and a config file. We initially ran basecalling with 0.9.0 and [email protected], and got the following results:

Although these were somewhat unexpected, they were definitely feasible and so we did not question the results. However, after receiving poly(A) data via another instrument and method, we became suspicious that these results were not accurate - nanopore and the alternate method usually agree quite closely. I re-ran the same dataset on v0.9.1 and [email protected] and got the same results. Still suspicious, I bumped us back down to Dorado 0.8.0 and [email protected] and then got this distribution of poly(A) estimations:

This distribution matches much more closely to the alternate method, and the distribution shape in general matches our historical data much better.

Steps to reproduce the issue:

Basecalling with same dataset, model, parameters, and config file. Only difference is dorado 0.8.0 vs 0.9.X.

Run environment:

Dorado version: 0.9.X
Dorado command: ~/packages/dorado-0.8.0-linux-x64/bin/dorado basecaller ~/packages/dorado-0.8.0-linux-x64/models/[email protected]
/path/pod5
--min-qscore 14
--estimate-poly-a
--poly-a-config /path/poly_a_config.toml
--no-trim
--device 'cuda:all' --verbose > dorado_sup.bam
Operating system: WSL
Hardware (CPUs, Memory, GPUs): NVIDIA A5000
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): POD5
Source data location (on device or networked drive - NFS, etc.): On device
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): FLOMIN114, SQKRBK114-24, 21.21 k reads
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)

malton-ont · 2025-01-28T16:37:39Z

Hi @VBHerrenC,

PolyA estimation is under continuous review, and there were some changes between those versions. Does your polyA transcript have a non-A linker section? dorado-0.9.0 is a bit stricter about breaking at non-A sections unless the appropriate tail.tail_interrupt_length value is specified in the --poly-a-config file.

VBHerrenC · 2025-01-28T16:40:13Z

Hi @malton-ont,

There aren't any non-A linkers - tail_interrupt_length in the config file was set to 0.

Thanks,
Calleigh

malton-ont · 2025-01-28T16:45:03Z

Hi @VBHerrenC,

Are you able to share any data? I think this will be very hard to diagnose without.

One thing to note is that dorado expects the polyA section to be somewhere within the sequence - i.e. the cleave point for the plasmid can't be within the polyA sequence or flanks.

You can get some useful insights into how the region is determined by adding the -vv flag - I'd suggest only running this on a small set of reads as it generates a lot of output.

VBHerrenC · 2025-01-28T16:50:37Z

Hi @malton-ont,

Unfortunately we can't share the data, but happy to try and do some testing and report back. Since we input circular plasmid to the library prep and it randomly cleaves, I would expect the amount of times it happens to cut in the poly(A) or flanks to be relatively low.

Calleigh

malton-ont · 2025-01-28T16:55:05Z

Hi @VBHerrenC,

In that case, if you could gather a small (~20 reads) dataset of reads that report significantly differently between the two versions and run these with -vv, we can attempt to parse the logs and see if there's anything obviously different? If you'd rather not share these publicly, please raise a ticket with our support team and reference this issue and my name, and they should pass it on to me.

VBHerrenC · 2025-01-28T17:25:09Z

Hi @malton-ont,

Thanks so much! Just opened a ticket and attached the requested logs. Let me know if you need anything else.

Calleigh

malton-ont added the polyA Issue related to polyA tail estimation label Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plasmid poly(A) Disagreement Between 0.8.0 and 0.9.1 #1233

Plasmid poly(A) Disagreement Between 0.8.0 and 0.9.1 #1233

VBHerrenC commented Jan 28, 2025

malton-ont commented Jan 28, 2025

VBHerrenC commented Jan 28, 2025

malton-ont commented Jan 28, 2025

VBHerrenC commented Jan 28, 2025

malton-ont commented Jan 28, 2025 •

edited

Loading

VBHerrenC commented Jan 28, 2025

Plasmid poly(A) Disagreement Between 0.8.0 and 0.9.1 #1233

Plasmid poly(A) Disagreement Between 0.8.0 and 0.9.1 #1233

Comments

VBHerrenC commented Jan 28, 2025

Issue Report

Please describe the issue:

Steps to reproduce the issue:

Run environment:

Logs

malton-ont commented Jan 28, 2025

VBHerrenC commented Jan 28, 2025

malton-ont commented Jan 28, 2025

VBHerrenC commented Jan 28, 2025

malton-ont commented Jan 28, 2025 • edited Loading

VBHerrenC commented Jan 28, 2025

malton-ont commented Jan 28, 2025 •

edited

Loading