You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the latest version of dorado (0.8.2) with the [email protected] and 5mC_5hmC modifed bases, I have discovered an issue with the MM:Z tag for reads where 0 methylated cytosines are called. Here is an example of one of the reads with 0 methylated cytosines in my dataset
The MM:Z tag reads MM:Z:C+h.,;C+m.,; . Based on section 1.7 of the samtools documentation (link), I think this tag should be MM:Z:C+h.;C+m.; in the case where 0 methylated cytosines are called (ie. there shouldn't be commas in the tag). The current tag with trailing commas breaks downstream bioinformatics tools that try to extract methylation information from the reads. Using a sed command to convert the MM:Z:C+h.,;C+m.,; tags to MM:Z:C+h.;C+m.; fixed the downstream tools.
Steps to reproduce the issue:
I suspect we encountered this issue partly because there were some issues with this sequencing run and we had a lot of very short low quality reads, like the one above. This meant we had a lot of reads with no/few cytosines or for which no confident modified basecalls could be made. These were the commands used for basecalling/demultiplexing. There were no error messages during either command, which ran to the end successfully
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
Source data location (on device or networked drive - NFS, etc.): On device
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): Promethion P2 Solo, SQK-NBD114-24. Most reads ~2000-3000 bases long but some very short reads in dataset - let me know if further details are needed. Total dataset size ~0.5TB.
The text was updated successfully, but these errors were encountered:
Issue Report
Running the latest version of dorado (0.8.2) with the [email protected] and 5mC_5hmC modifed bases, I have discovered an issue with the MM:Z tag for reads where 0 methylated cytosines are called. Here is an example of one of the reads with 0 methylated cytosines in my dataset
The MM:Z tag reads MM:Z:C+h.,;C+m.,; . Based on section 1.7 of the samtools documentation (link), I think this tag should be MM:Z:C+h.;C+m.; in the case where 0 methylated cytosines are called (ie. there shouldn't be commas in the tag). The current tag with trailing commas breaks downstream bioinformatics tools that try to extract methylation information from the reads. Using a sed command to convert the MM:Z:C+h.,;C+m.,; tags to MM:Z:C+h.;C+m.; fixed the downstream tools.
Steps to reproduce the issue:
I suspect we encountered this issue partly because there were some issues with this sequencing run and we had a lot of very short low quality reads, like the one above. This meant we had a lot of reads with no/few cytosines or for which no confident modified basecalls could be made. These were the commands used for basecalling/demultiplexing. There were no error messages during either command, which ran to the end successfully
Run environment:
The text was updated successfully, but these errors were encountered: