Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: analyze_snp

Description

analyze_snp can be used to analyze the number of Single Nucleotide Polymophisms or mismatches and insertions. This is done by mapping a stack of query sequences to one or more subject sequences using e.g. bwa_seq and subsequently analyze the alignment descriptors in the result.

For more about alignment descriptors see read_kiss or

http://code.google.com/p/biopieces/wiki/KissFormat

The records output from analyze_snp looks like this:

SRC: C
SNP_COUNT: 7
REC_TYPE: SNP
S_ID: contig00115
DST: T
TYPE: MISMATCH
S_POS: 1368
---
  • S_ID indicates the contig where the SNP was found.
  • S_POS is the position in the above sequence harboring the SNP.
  • SRC is the nucleotide (or - for insertions) located at the above position.
  • DST is the changed nucleotide (or - for deletion)
  • TYPE indicate INSERTION, DELETION or MISMATCH.

Usage

... | analyze_snp [options]

Options

[-?         | --help]               #  Print full usage description.
[-m <uint>  | --min=<uint>]         #  Minimum SNP count to report  -  Default=1
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to file         -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

First we create a BWA index of a reference genome like this:

read_fasta -i genome.fna | create_bwa_index -d ~my_dir/ -i my_index -x

Next we map a stack of reads :

read_solexa -i reads.fq | bwa_seq -i ~my_dir/my_index | write_kiss -xo out.kiss

Finally, we can analyze the result for SNPs:

read_kiss -i | analyze_snp -m 5 | write_tab -m 5 | grab -p SNP | write_tab -k S_ID,S_POS,DST,SRC,SNP_COUNT,TYPE -xco snp.tab

The resulting table would look similar to this:

#S_ID   S_POS   DST     SRC     SNP_COUNT       TYPE
contig00001     88      A       T       17      MISMATCH
contig00001     90      G       T       16      MISMATCH
contig00001     140     A       -       25      INSERTION
contig00001     140     A       T       30      MISMATCH
contig00001     5100    T       G       10      MISMATCH

See also

read_kiss

write_kiss

create_bwa_index

bwa_seq

write_tab

grab

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

November 2009

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

analyze_snp is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally