-
Notifications
You must be signed in to change notification settings - Fork 23
analyze_snp
analyze_snp can be used to analyze the number of Single Nucleotide Polymophisms or mismatches and insertions. This is done by mapping a stack of query sequences to one or more subject sequences using e.g. bwa_seq and subsequently analyze the alignment descriptors in the result.
For more about alignment descriptors see read_kiss or
http://code.google.com/p/biopieces/wiki/KissFormat
The records output from analyze_snp looks like this:
SRC: C
SNP_COUNT: 7
REC_TYPE: SNP
S_ID: contig00115
DST: T
TYPE: MISMATCH
S_POS: 1368
---
- S_ID indicates the contig where the SNP was found.
- S_POS is the position in the above sequence harboring the SNP.
- SRC is the nucleotide (or - for insertions) located at the above position.
- DST is the changed nucleotide (or - for deletion)
- TYPE indicate INSERTION, DELETION or MISMATCH.
... | analyze_snp [options]
[-? | --help] # Print full usage description.
[-m <uint> | --min=<uint>] # Minimum SNP count to report - Default=1
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to file - Default=STDOUT
[-v | --verbose] # Verbose output.
First we create a BWA index of a reference genome like this:
read_fasta -i genome.fna | create_bwa_index -d ~my_dir/ -i my_index -x
Next we map a stack of reads :
read_solexa -i reads.fq | bwa_seq -i ~my_dir/my_index | write_kiss -xo out.kiss
Finally, we can analyze the result for SNPs:
read_kiss -i | analyze_snp -m 5 | write_tab -m 5 | grab -p SNP | write_tab -k S_ID,S_POS,DST,SRC,SNP_COUNT,TYPE -xco snp.tab
The resulting table would look similar to this:
#S_ID S_POS DST SRC SNP_COUNT TYPE
contig00001 88 A T 17 MISMATCH
contig00001 90 G T 16 MISMATCH
contig00001 140 A - 25 INSERTION
contig00001 140 A T 30 MISMATCH
contig00001 5100 T G 10 MISMATCH
Martin Asser Hansen - Copyright (C) - All rights reserved.
November 2009
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
analyze_snp is part of the Biopieces framework.