Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplex + Polish #1230

Open
strejcem opened this issue Jan 27, 2025 · 1 comment
Open

Duplex + Polish #1230

strejcem opened this issue Jan 27, 2025 · 1 comment

Comments

@strejcem
Copy link

strejcem commented Jan 27, 2025

Hi,
with Dorado 0.9.1+c8c2c9f for a bacterial genome I ran:
dorado polish --bacteria aligned_reads.sorted.bam flye/assembly.fasta > polished_assembly_corrected.fasta

which leads to
[error] Caught exception: The input BAM contains more than one read group. Please specify --RG to select which read group to process.

When I run:
dorado polish --bacteria aligned_reads.sorted.bam flye/assembly.fasta --RG test > corrected_polished.fasta
I get
[error] Caught exception: No @RG headers found in the input BAM, but user-specified RG was given. RG: 'test'

However,
samtools view -H duplex.bam

@HD     VN:1.6  SO:unknown
@PG     ID:duplex       PN:dorado       VN:0.9.1+c8c2c9f        CL:dorado duplex sup 20250120_1604_X3_FBA42844_166c12fa/pod5_skip/      DS:gpu:NVIDIA GeForce RTX 3060
@PG     ID:samtools     PN:samtools     PP:duplex       VN:1.21 CL:samtools view -H duplex.bam
@RG     ID:3d5a29885777c9915e9d68ac567fa92062b91a43_dna_r10.4.1_e8.2_400bps_sup@[email protected]        PU:FBA42844     PM:GXB03675     DT:2025-01-20T15:08:56.149+00:00        PL:ONT  DS:[email protected][email protected] runid=3d5a29885777c9915e9d68ac567fa92062b91a43     LB:VSCHT_GGE255_1a      SM:VSCHT_GGE255_1a
@RG     ID:3d5a29885777c9915e9d68ac567fa92062b91a43_dna_r10.4.1_e8.2_400bps_sup@v5.0.0  PU:FBA42844     PM:GXB03675     DT:2025-01-20T15:08:56.149+00:00        PL:ONT  DS:[email protected] runid=3d5a29885777c9915e9d68ac567fa92062b91a43       LB:VSCHT_GGE255_1a      SM:VSCHT_GGE255_1a

How should I treat polish with duplex calling?

Thanks
Michal

@strejcem
Copy link
Author

Immediately after I wrote the issue I tried to use the ID field of the RG header which worked, at least for the simplex data.
dorado polish --bacteria aligned_reads.sorted.bam flye/assembly.fasta --RG 3d5a29885777c9915e9d68ac567fa92062b91a43_dna_r10.4.1_e8.2_400bps_sup@v5.0.0 > polished_simplex.fasta

However, duplex with --bacteria throws
[error] Caught exception: There are no bacterial models compatible with basecaller model: '[email protected][email protected]'.
and without --bacteria

[error] Selected model doesn't exist: [email protected][email protected]_polish_rl
[error] Could not download model: [email protected][email protected]_polish_rl

Also, when polishing simplex data from duplex calls it throws:
[info] Input data does not contain move tables.
dorado duplex doesn't support --emit-moves.

BTW, in the documentation it is written that we can check the mv tag using samtools. However, in mmy case with samtools 1.21
samtools view --keep-tag "mv" -c aligned_reads.sorted.bam
and
samtools view -c aligned_reads.sorted.bam
outputs the same value. But
samtools view -d "mv" -c aligned_reads.sorted.bam
outputs 0

This might be modified in the documentation.

So this is actually solved, just the documentation/error messages might be improved?

Is duplex > correct > assembly > polish --RG [simplex ID] the correct way to do it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant