Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAM/VCF Strict draft #283

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

SAM/VCF Strict draft #283

wants to merge 1 commit into from

Conversation

d-cameron
Copy link
Contributor

PR for discussing proposed SAM Strict rule changes. Work in progress continuing from #281 discussion.

@d-cameron
Copy link
Contributor Author

@jkbonfield I'd rather not expand this particular PR to include changes to the SAM specs themselves. My PR does appear to be highlighting some issues with SAMv1.5 that have avoided scrutiny, but I think the best course of action is to get a SAM v1.5 strict document and validator implementation based on the v1.5 specifications before updating the specs.

@jkbonfield
Copy link
Contributor

That's fine - I can accept that as an order in which things are done. Yes I think there are problems with the spec which haven't been spotted as no one has really gone over it with such a fine tooth-comb before. Thank you for doing this.

@yfarjoun
Copy link
Contributor

yfarjoun commented Feb 8, 2018

Could you do me a favor and include the pdf in these commits? when we get to the end you can remove the pdf and compile it "correctly" for the sake of having the PDF versioned.

@yfarjoun
Copy link
Contributor

yfarjoun commented Feb 8, 2018

I was wondering if we want the error "code" to not have spaces in it. This will help when we want to run a validation tool on the commandline and specify which codes to ignore. Spaces will mean that every code needs to be surrounded by quotes. COULD_WE_USE_THIS_FORMAT_FOR_CODES? (not shouting, just suggesting!)

@jkbonfield
Copy link
Contributor

The TeX is invalid at the moment, meaning a PDF cannot currently be produced.

I'm not a TeX expert, but it looks like you need eg \vfifteen instead of \v15 commands and braces around the {\tt text} sections. There are some other minor things like \bam not being defined too. I tried correcting this and have a pdf that builds now, although it's formatted wrong and becomes pure fixed-width after a while.

@jkbonfield
Copy link
Contributor

The version in https://github.com/jkbonfield/hts-specs/tree/strict builds now, but needs a lot of formatting still. I'm not planning on doing this myself - this was just a test to see how to get it in a readable format.

Copy link
Contributor

@yfarjoun yfarjoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!! there seems to be a lo of work left at the bottom part of the document, but I like the rigor so far.

I put some comments in the discussion...


\section{Headers}

{paragraph}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a \begin or a \section missing here? looks like copy-paste error


{paragraph}

The first segment is is considered to be the "next" segment of the final segment in a template as per the SAM specifications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems out of place



\section{Headers}
\samstrictrule{Undefined reserved header present}{Upper-case header record type codes are not defined in the SAM specifications must not be used.}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing "that" between codes and are (or remove are if that's the style you're going for)

\section{Headers}
\samstrictrule{Undefined reserved header present}{Upper-case header record type codes are not defined in the SAM specifications must not be used.}
\samstrictrule{Undefined header tag present}{Upper-case header tags not defined in the SAM specifications must not be used.}
\samstrictrule{Tag present as both lowercase and uppercase}{A file should not contain the same tag in both upper-case and lowercase format. See the SAM specifications header tags best practice footnote.}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag is ambiguous, should be header tag (both in code and text)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't interpret that foot-note the same way.....I interpret it as "lower case is a free for all, for upper case tags open an issue in hts-spec"

\samstrictrule{Undefined reserved header present}{Upper-case header record type codes are not defined in the SAM specifications must not be used.}
\samstrictrule{Undefined header tag present}{Upper-case header tags not defined in the SAM specifications must not be used.}
\samstrictrule{Tag present as both lowercase and uppercase}{A file should not contain the same tag in both upper-case and lowercase format. See the SAM specifications header tags best practice footnote.}
\samstrictrule{Malformed header}{Header lines with conform to either the {\tt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"malformed header line"
s/with/must/
either is generally used with two options...could you reword without either?

\samstrictrule{SEQ QUAL length mismatch.}{The length of a non-* QUAL must match the length of SEQ.}{\samrule}
\samstrictrule{Invalid QUAL}{The ASCII value of all QUAL bases must be at least 33.}
\samstrictrule{QUAL of secondary alignments specified.}{QUAL of secondary alignments (0x100 FLAG set) should be set to * to reduce the file size.}{\v15bestpractice}
\samstrictrule{TODO: QUAL edge case}{What should we do when a read is length 1 and the QUAL encodes to "*" ?}{\samrule}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say that spending any time on this edge case seems to be a waste...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've actually hit that one before! Hah.

\samstrictrule{Unknown reserved tag}{
No record can include any reserved tags not defined in the
{\sl Sequence Alignment/Map Optional Fields Specification}.
Non-standard tags must start X, Y, Z or a lowercase letter as per the SAM specifications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...must start with...

}
\samstrictrule{Incorrect tag type}{The \textit{type} of all \textit{standard tags} must match the type
defined in the {\sl Sequence Alignment/Map Optional Fields Specification}.}
\samstrictrule{Duplicate tag}{}{\samrule}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no text for "duplicate tag"?



\subsection{RG}
3 When a RG tag appears anywhere in the alignment section, there should be a single corresponding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove 3 and 4 here and below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should->must

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, need error codes

3 When a RG tag appears anywhere in the alignment section, there should be a single corresponding
@RG line with matching ID tag in the header.{\v15bestpractice}
\subsection{RG}
4 When a PG tag appears anywhere in the alignment section, there should be a single corresponding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this duplicated?

@jmarshall
Copy link
Member

jmarshall commented Jul 14, 2020

Any discussion or review comments on SAM and VCF strictness proposals are naturally separate. Would it not be more practical to open a VCF strict proposal as a new separate PR?

@tskir
Copy link
Member

tskir commented Nov 9, 2020

I agree with a comment from @jmarshall above about splitting this into two. Since in its current form it concerns both VCF and SAM, I've added the appropriate labels for the time being

@d-cameron
Copy link
Contributor Author

Happy to progress these specs if someone has funding/budget/time to actually write a validator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Stalled
Development

Successfully merging this pull request may close these issues.

5 participants