Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discrepancy in number of reads in bam files #1242

Open
baibhav-bioinfo opened this issue Feb 4, 2025 · 1 comment
Open

discrepancy in number of reads in bam files #1242

baibhav-bioinfo opened this issue Feb 4, 2025 · 1 comment

Comments

@baibhav-bioinfo
Copy link

baibhav-bioinfo commented Feb 4, 2025

Hello,

I used dorado to basecall the pod5 files into bam using m6A_DRACH model for 3 replicates of 3 conditions.
following command was used in for all replicates:
dorado basecaller [email protected] r1/pod5/ --modified-bases-models [email protected]_m6A_DRACH@v1/ --device cuda:all > r1_.bam

but when i convert the bam to fastq file, the reads in r2 and r3 are ~3 times in number and also the reads are much shorter. See the table below
Sample #reads length
Condition1_r1 10,025,268 | 1,135
Condition1_r2 33,550,922 | 180
Condition1_r3 35,376,551 | 272.1
Condition2_r1 12,537,085 | 1,107
Condition2_r2 25,225,958 | 395.5
Condition2_r3 37,304,553 | 263.5
Condition3_r1 12,715,347 | 1,065
Condition3_r2 19,535,787 | 458.4
Condition3_r3 22,782,953 | 357.9

the number of pod5 files for r2 and r3 are more in number.
can that be the reason for the difference in number of reads basecalled?

what can be the reason?

@HalfPhoton
Copy link
Collaborator

HalfPhoton commented Feb 5, 2025

Hi @baibhav-bioinfo,
This is likely due to read splitting - please check out the FAQ : Why do I have more records than reads?.

It looks like your 2nd and 3rd replicate contain shorter reads or something has changed with your sample as there are many more shorter reads.

Best regards,
Rich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants