Skip to content

ValidateSamFile wrong NM tag computation  #1963

Open
@albertocasagrande

Description

@albertocasagrande

Bug Report

Affected tool(s)

picard ValidateSameFile

Affected version(s)

  • Latest public release version [3.1.1]
  • Latest development/master branch (not tested)

Description

Calling picard ValidateSamFile with a reference sometimes produces ERROR:INVALID_TAG_NM even though the read alignment on the reference and CIGAR code match.

Steps to reproduce

  1. Download and decompress the reference sequence
curl https://ftp.ensembl.org/pub/grch37/release-111/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.chromosome.22.fa.gz -o chr22.fasta.gz
gunzip chr22.fasta.gz
  1. Create the file test.sam containing the following lines
@HD	VN:1.6	SO:unknown
@SQ	SN:22	LN:51304566	AN:chromosome22,chr22,chromosome_22,chr_22
@RG	ID:S_1_1	SM:S_1_1	PL:ILLUMINA
r00000000001	16	22	42063771	33	55M2D95M	*	0	0	CTTCTAGTGTTCCTCGCTACCCTGCAATTTTAGCATGACCATTTATTTATTTATGTTTGTTTGTTTATTTATTTATTTATGACTACAAAGATCCAGAAGACAAACATGACCATTTCTTTCTTTTTTTTTTTTTTTCTGAGATGGAGTCTT	!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!	RG:Z:S_1_1	NM:i:2
r00000000002	16	22	50364657	33	121M29X	*	0	0	GTGACAGGGGAGGAGTCTGGAGCTGAGAGGCGAACGGAGAGCACAGTGGAGCACACGGGCCCTGCCCACCCGCCTGTCCTGTCCAAGGATGCTGGGGCCCCGACCAGCCGGTCACAGGCGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNN	!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!	RG:Z:S_1_1	NM:i:29
  1. Validate the sam file with Picard
java -jar picard.jar ValidateSamFile -I test.sam -R chr22.fasta

Expected behavior

The validation should succeed. The first read alignment and its CIGAR correspond. The second read perfectly matches the reference, but the final 29 'N's should be considered mismatches according to the NM specification given in "Sequence Alignment/Map Optional Fields Specification" (see at the bottom of p.3).

Actual behavior

Picard produces the following INVALID_TAG_NM errors:

ERROR::INVALID_TAG_NM:Record 1, Read name r00000000001, NM tag (nucleotide differences) in file [2] does not match reality [3]
ERROR::INVALID_TAG_NM:Record 2, Read name r00000000002, NM tag (nucleotide differences) in file [29] does not match reality [0]

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions