Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running bayesembler 1.2.0 #14

Open
voshalla opened this issue Oct 5, 2015 · 4 comments
Open

Error running bayesembler 1.2.0 #14

voshalla opened this issue Oct 5, 2015 · 4 comments

Comments

@voshalla
Copy link

voshalla commented Oct 5, 2015

When trying to run bayesembler 1.2.0 on an alignment generated by tophat2, I get the following error:

bayesembler: /seqdata/krogh/jola/projects/transcriptome_assembly/code/release/bayesembler_1_2_0/src/assembler.cpp:186: void Assembler::markDuplicates(BamTools::BamAlignment&, Assembler::FirstReads_, Assembler::ReadPairs_): Assertion `cur_pos_first_reads_it->second.insert(pair<ReadId, BamTools::BamAlignment*>(ri, new BamTools::BamAlignment(current_alignment))).second' failed.

I assumed it was an issue with the order of the alignments in the bam file, but it still happens after resorting the bam file with samtools, regardless of the version. I was able to run bayesembler on other datasets using the same version of tophat2 without issue, so it doesn't seem to be an issue with the installs.

@lassemaretty
Copy link
Contributor

Hi,

Thank you for posting. Would it be possible for you to make the data available to us?

Best,

Lasse

@voshalla
Copy link
Author

voshalla commented Oct 6, 2015

The smallest bam file causing the error can be downloaded from here:

https://unl.box.com/s/kym2l74fnfd66vt0dskts84awvw5onyh

It was generated by aligning the following reads to the TAIR transcriptome for Arabidopsis:

https://unl.box.com/s/0emodcukni923a49mit1eu23s1hzlvsp

@lassemaretty
Copy link
Contributor

thanks! Ill look into it and get back to you!

@voshalla
Copy link
Author

I found the solution for this error. Because the reads we're using are simulated expression data, the read name is the sequence coordinates it contains. When two reads are generated from the same coordinates, the read names are not unique. Changing the read names to ensure they are always unique resolved this error, however, I'm now getting the following error later in the assembly:

[23/10/2015 11:13:06] Removing duplicate reads
[23/10/2015 11:14:06] Removed duplicates from 3606471 mapped read pairs
[23/10/2015 11:14:06] Wrote 2978238 read pairs used for splice-graph construction

[23/10/2015 11:14:06] Spawning graph construction thread
[23/10/2015 11:14:06] Generating splice-graphs from stringtie-q20.gtfaccepted_hits_nd_unstranded.bam using cem
[23/10/2015 11:15:17] Parsed 7942 graph(s) from cem instance file

[23/10/2015 11:15:17] Parsed 7942 splice graph(s) from cem instance file and collapsed them to 6050 assembly graph(s) (1736 graph(s) excluded due to inference issues resulting from unstranded data).
[23/10/2015 11:15:17] 2877984 unique, non-redundant read pairs being used for quantification
[23/10/2015 11:15:17] 2.87798e+06 read pairs being used for FPKM normalisation

[23/10/2015 11:15:17] Sorting splice-graphs by read count
[23/10/2015 11:15:17] Finished sorting splice-graphs by read count

[23/10/2015 11:15:17] Spawning 15 thread(s) for fetching alignments and 1 i/o thread
[23/10/2015 11:16:42] Estimating fragment length distribution from 703 transcripts longer than 2500 nucleotides
[23/10/2015 11:16:42] Estimated fragment length "median"=302 and "median absolute deviation"=0 using 543862 observations
[23/10/2015 11:16:42] Using Gaussian fragment length distribution with parameters: Mean=302 and SD=0

[23/10/2015 11:16:42] Starting Bayesembler on 725 multi-path graph(s) and 5325 single-path graph(s)
[23/10/2015 11:16:42] Spawning 15 Bayesembler thread(s) and 2 i/o threads

bayesembler: /seqdata/krogh/jola/projects/transcriptome_assembly/code/release/bayesembler_1_2_0/src/alignmentParser.cpp:445: CollapsedMap AlignmentParser::calculateFragTranProbabilities(std::vector&, std::vector<FragmentAlignment*>, SequencingModel, bool, std::stringstream&, std::tr1::unordered_mapstd::basic_string<char, int>*): Assertion `probability_matrix.block(0, i, row_idx, 1).sum() < double_underflow' failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants