Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCR primers and barcodes not apparent in large portion of reads #1229

Open
smith12380 opened this issue Jan 27, 2025 · 2 comments
Open

PCR primers and barcodes not apparent in large portion of reads #1229

smith12380 opened this issue Jan 27, 2025 · 2 comments

Comments

@smith12380
Copy link

smith12380 commented Jan 27, 2025

Issue Report

Please describe the issue:

I've got transcriptomic cDNA that was run through a v14 kit on promethion, and I multiplexed them using barcoded poly-dT RT primers. The cDNAs also have a sequence appended to the 5' end that's complementary to a constant region on the barcode primers, and I've amplified the cDNA with primers against that sequence. However, even with trimming turned off or just to adapters, about a third of my reads have no discernable version of that sequence, and only 25-50% of reads have a reasonable barcode sequence (depending on gating parameters).

Is there any chance that Dorado is cutting off some of the sequence that might contain my primers and barcodes, even with trimming off? Or does anyone have ideas why I'd be getting good reads that usually will align to the reference genome, yet don't have the primers? I checked whether I'm getting crazy read splitting, and that doesn't seem to be the case.

Thanks for any help!

Steps to reproduce the issue:

I'm just doing a basic bidirectional alignment with the primer sequence (GTACTCTGCGTTGATACCACTGCTT) against my reads, and at least a third seem to not have the primer on one side or the other when I set a maximum edit distance of 6bp.

Run environment:

  • Dorado version: 0.5.1, 0.6.0, 0.7.2, 0.8.1, and 0.9.1
  • Dorado command: on 0.9.1: dorado-0.9.1-linux-x64/bin/dorado basecaller sup --emit-sam --no-trim -v PAQ63465_pass_836a84c0_0fbe1fe9_0.pod5 > test.sam
  • Operating system: linux
  • Hardware (CPUs, Memory, GPUs): V100, A100, and H100 GPUs
  • Source data type: POD5
  • Source data location (on device or networked drive - NFS, etc.): HPC
  • Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): FLO-PRO114M, customized version of SQK-LSK114
  • Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue): pod5s don't seem to be permitted here, and it keeps failing to upload any packaged version of the file. I'll find a way to put a sample file here asap.
@malton-ont
Copy link
Collaborator

@smith12380,

As you're running with the --no-trim, dorado won't remove anything more than a default 10 samples. If there are no primers showing in your reads, I suspect this will be a prep issue.

@smith12380
Copy link
Author

@malton-ont

Thanks for the insight! I take that to mean that dorado ignores the first 10 signal samples per signal trace?
I figured that this might end up being a prep problem, though I've never encountered this sort of (assumed) degradation before on older short-read tech. I'll try to handle the next samples more carefully following the PCR amplification, maybe they're getting sheared a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants