Skip to content

Document the ambiguity of 1-base long seq with QUAL * #724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 15, 2025

Conversation

jkbonfield
Copy link
Contributor

This is an extreme edge case likely to never occur, but nevertheless tool implementors still need to know how to handle it. Given it may be QUAL 9 or it may be QUAL "unknown", we treat it as always unknown.

Fixes #715

This is an extreme edge case likely to never occur, but nevertheless
tool implementors still need to know how to handle it.  Given it *may*
be QUAL 9 or it *may* be QUAL "unknown", we treat it as always unknown.

Fixes samtools#715
@jkbonfield jkbonfield added the sam label May 2, 2023
@github-actions
Copy link

github-actions bot commented May 2, 2023

Changed PDFs as of cca15e8: SAMv1 (diff).

Copy link

Changed PDFs as of 7915c3d: SAMv1 (diff).

@jmarshall
Copy link
Member

I have finally got around to finishing up my draft of adjusting this footnote. (The diff PDF preview has gone slightly mad — the actual change is the new footnote 17 on page 9.) It's been so long that I can't remember the details, but this is a tidied-up version of the text lying around in my specs directory after discussions on this one in meetings a few months ago.

The changes are basically:

  • Avoid referring to this as ambiguous — now that the spec provides a ruling (albeit in a footnote), it's not ambiguous.
  • I think we discussed offering advice on what tools should do if they have a single-base QUAL of 9 to write out. The second sentence does so, and says do it for all of SAM/BAM/CRAM to avoid issues later when BAM/CRAM files are viewed as SAM. But I'd also be happy if you preferred to drop that and just have the first sentence for succinctness.

@jmarshall jmarshall merged commit f7809ca into samtools:master Jul 15, 2025
1 check passed
@github-project-automation github-project-automation bot moved this from Needs review to Done in GA4GH File Formats Jul 15, 2025
@jmarshall
Copy link
Member

On the merged commit, @colinhercus comments (f7809ca#commitcomment-162178132):

In this situation Novoalign has always reduced the base quality to 8. I don't plan to change it.

(The PR recommends adjusting the base quality to 10.)

I wish you'd noticed this updated PR in the month before we merged it… 😄 Or in the two years since the issue was first noted as an issue on this repository… 😄 😄

I thought of looking to see if any of the base quality binning strategies used either 8 or 10 or a nearby value, but in the end we picked 10 arbitrarily. Do you recall whether you picked 8 arbitrarily or was there some reasoning behind the choice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

sam: Quality score ambiguity when sequence is a single base
2 participants