PCR primers and barcodes not apparent in large portion of reads #1229

smith12380 · 2025-01-27T15:55:23Z

Issue Report

Please describe the issue:

I've got transcriptomic cDNA that was run through a v14 kit on promethion, and I multiplexed them using barcoded poly-dT RT primers. The cDNAs also have a sequence appended to the 5' end that's complementary to a constant region on the barcode primers, and I've amplified the cDNA with primers against that sequence. However, even with trimming turned off or just to adapters, about a third of my reads have no discernable version of that sequence, and only 25-50% of reads have a reasonable barcode sequence (depending on gating parameters).

Is there any chance that Dorado is cutting off some of the sequence that might contain my primers and barcodes, even with trimming off? Or does anyone have ideas why I'd be getting good reads that usually will align to the reference genome, yet don't have the primers? I checked whether I'm getting crazy read splitting, and that doesn't seem to be the case.

Thanks for any help!

Steps to reproduce the issue:

I'm just doing a basic bidirectional alignment with the primer sequence (GTACTCTGCGTTGATACCACTGCTT) against my reads, and at least a third seem to not have the primer on one side or the other when I set a maximum edit distance of 6bp.

Run environment:

Dorado version: 0.5.1, 0.6.0, 0.7.2, 0.8.1, and 0.9.1
Dorado command: on 0.9.1: dorado-0.9.1-linux-x64/bin/dorado basecaller sup --emit-sam --no-trim -v PAQ63465_pass_836a84c0_0fbe1fe9_0.pod5 > test.sam
Operating system: linux
Hardware (CPUs, Memory, GPUs): V100, A100, and H100 GPUs
Source data type: POD5
Source data location (on device or networked drive - NFS, etc.): HPC
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): FLO-PRO114M, customized version of SQK-LSK114
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue): pod5s don't seem to be permitted here, and it keeps failing to upload any packaged version of the file. I'll find a way to put a sample file here asap.

malton-ont · 2025-02-06T09:21:37Z

@smith12380,

As you're running with the --no-trim, dorado won't remove anything more than a default 10 samples. If there are no primers showing in your reads, I suspect this will be a prep issue.

smith12380 · 2025-02-06T16:57:21Z

@malton-ont

Thanks for the insight! I take that to mean that dorado ignores the first 10 signal samples per signal trace?
I figured that this might end up being a prep problem, though I've never encountered this sort of (assumed) degradation before on older short-read tech. I'll try to handle the next samples more carefully following the PCR amplification, maybe they're getting sheared a bit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCR primers and barcodes not apparent in large portion of reads #1229

PCR primers and barcodes not apparent in large portion of reads #1229

smith12380 commented Jan 27, 2025 •

edited

Loading

malton-ont commented Feb 6, 2025

smith12380 commented Feb 6, 2025

PCR primers and barcodes not apparent in large portion of reads #1229

PCR primers and barcodes not apparent in large portion of reads #1229

Comments

smith12380 commented Jan 27, 2025 • edited Loading

Issue Report

Please describe the issue:

Steps to reproduce the issue:

Run environment:

malton-ont commented Feb 6, 2025

smith12380 commented Feb 6, 2025

smith12380 commented Jan 27, 2025 •

edited

Loading