Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{readname} did not have a primary R1 record. (GroupReadsByUmi) #995

Open
xiechangxiao opened this issue Jun 14, 2024 · 4 comments
Open

{readname} did not have a primary R1 record. (GroupReadsByUmi) #995

xiechangxiao opened this issue Jun 14, 2024 · 4 comments

Comments

@xiechangxiao
Copy link

xiechangxiao commented Jun 14, 2024

when I use GroupReadsByUmi group reads, this error occurs.@nh13
cmd:
java -jar fgbio-2.2.1.jar --tmp-dir=/tmp GroupReadsByUmi --input=test.bam --output=test.umi.group.BAM --strategy=paired --min-map-q=20 --edits=1 --raw-tag=RX --family-size-histogram=test.family.size.histogram

error message:
Exception in thread "main" java.lang.IllegalStateException: {readname} did not have a primary R1 record

@nh13
Copy link
Member

nh13 commented Jun 14, 2024

Can you tell us how you created the test bam, and the provide a test case (BAM) to debug? Did you sort the reads prior?

@xiechangxiao
Copy link
Author

First, mapping with BWA:
bwa mem -M -K 10000000 -R "@RG\tID:1\tPL:Illumina\tLB:hg19\tSM:tumor" hg19.fasta Tumor_1.clean.fq.gz Tumor_2.clean.fq.gz |samtools view -b -o Tumor.raw.bam - && samtools sort Tumor.raw.bam -o Tumor.sort.bam -O bam
Then, extract MQ and RX:
samtools view -h Tumor.sort.bam | mawk '/^@/ {print;next} {N=split($1,n,":");gsub("_", "-", n[N]);print $0 "\tMQ:i:" $5 "\tRX:Z:" n[N]}' |samtools view -b - >test.bam
Finally, use GroupReadsByUmi and get errors:
Exception in thread "main" java.lang.IllegalStateException: E250006338L1C038R0044055987:GTGAG_GCACG did not have a primary R1 record.
But, the flag value of this paired read is 113 and 177.@nh13

@nh13
Copy link
Member

nh13 commented Jun 17, 2024

Can you confirm there is a R1 primary record in your test.bam "E250006338L1C038R0044055987:GTGAG_GCACG"? Either your RX/MQ extraction isn't working as intended, or your sort order into GroupReadsByUmi is wrong (try samtools sort --template-coordinate using the latest samtools). Having test.bam would be helpful to debug futher.

To extract the UMI bases, try FastqToBam or other tools in fgbio, which we support, and are used in our best-practices pipeline:

https://github.com/fulcrumgenomics/fgbio/blob/main/docs/best-practice-consensus-pipeline.md

@xiechangxiao
Copy link
Author

I provide a simple bam file for you debug.@nh13
test.zip
If you remove "E250006338L1C038R0044055987:GTGAG_GCACG" from this bam file, GroupReadsByUmi will work properly.
By the way, I've followed your best practices, but it's time consuming and space consuming.
Personally, I think there could be easier way to build a consensus pipeline.
For example:

  1. shift UMI to readname with fastp
  2. mapping with BWA
  3. GroupReadsByUmi and CallMolecularConsensusReads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants