Skip to content

Represent U in SAM rather than T for reads from RNA #801

Open
@Psy-Fer

Description

@Psy-Fer

Before this change samtools/htslib#1854 U was changed to N when read by samtools

Now it will be changed to T

However, I think it would be "better" if we could preserve U in SAM, even when moving SAM->BAM->CRAM->SAM for example.

There is a problem, however, that there is no room in the 4bits BAM uses to represent all 16 IUPAC bases (where T is for T and U).

A solution to this raised by @jmarshall could be to allocate a FLAG bit to indicate an alignment record is RNA, which would then mean the T coming from a BAM, would be written as a U when viewed in SAM.

This would also mean most tools would still work, while building for the future of RNA sequencing methods to represent the base that is actually being measured.

Another solution (though more ad-hoc and less "good") would be to make yet another sam tag, to denote the read is from RNA. This saves using a FLAG bit, but adds more complexity to the solution.

Cheers,
James

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions