Open
Description
Bug Report
Affected tool(s)
picard ValidateSameFile
Affected version(s)
- Latest public release version [3.1.1]
- Latest development/master branch (not tested)
Description
Calling picard ValidateSamFile
with a reference sometimes produces ERROR:INVALID_TAG_NM
even though the read alignment on the reference and CIGAR code match.
Steps to reproduce
- Download and decompress the reference sequence
curl https://ftp.ensembl.org/pub/grch37/release-111/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.chromosome.22.fa.gz -o chr22.fasta.gz
gunzip chr22.fasta.gz
- Create the file
test.sam
containing the following lines
@HD VN:1.6 SO:unknown
@SQ SN:22 LN:51304566 AN:chromosome22,chr22,chromosome_22,chr_22
@RG ID:S_1_1 SM:S_1_1 PL:ILLUMINA
r00000000001 16 22 42063771 33 55M2D95M * 0 0 CTTCTAGTGTTCCTCGCTACCCTGCAATTTTAGCATGACCATTTATTTATTTATGTTTGTTTGTTTATTTATTTATTTATGACTACAAAGATCCAGAAGACAAACATGACCATTTCTTTCTTTTTTTTTTTTTTTCTGAGATGGAGTCTT !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! RG:Z:S_1_1 NM:i:2
r00000000002 16 22 50364657 33 121M29X * 0 0 GTGACAGGGGAGGAGTCTGGAGCTGAGAGGCGAACGGAGAGCACAGTGGAGCACACGGGCCCTGCCCACCCGCCTGTCCTGTCCAAGGATGCTGGGGCCCCGACCAGCCGGTCACAGGCGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNN !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! RG:Z:S_1_1 NM:i:29
- Validate the sam file with Picard
java -jar picard.jar ValidateSamFile -I test.sam -R chr22.fasta
Expected behavior
The validation should succeed. The first read alignment and its CIGAR correspond. The second read perfectly matches the reference, but the final 29 'N's should be considered mismatches according to the NM specification given in "Sequence Alignment/Map Optional Fields Specification" (see at the bottom of p.3).
Actual behavior
Picard produces the following INVALID_TAG_NM errors:
ERROR::INVALID_TAG_NM:Record 1, Read name r00000000001, NM tag (nucleotide differences) in file [2] does not match reality [3]
ERROR::INVALID_TAG_NM:Record 2, Read name r00000000002, NM tag (nucleotide differences) in file [29] does not match reality [0]
Metadata
Metadata
Assignees
Labels
No labels