-
Notifications
You must be signed in to change notification settings - Fork 23
oligo_freq
Martin Asser Hansen edited this page Oct 2, 2015
·
5 revisions
Use oligo_freq if you want to determine the frequencies of subsequences or oligo of a sequence - or of all sequence in the stream. This is useful if you e.g. want to determine the di-nucleotide frequency or a codon usage frequence table.
... | oligo_freq [options]
[-? | --help] # Print full usage description.
[-w <uint> | --word_size=<uint>] # Size of oligos - Default=7
[-a | --all] # Accumulate oligos for all sequences in stream.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTA entries in the file test.fna
:
>test1
AAATG
>test2
TGAAA
To read the sequence use read_fasta using the -w
switch to
chose a word size of 3:
read_fasta -i test.fna | oligo_freq -w 3
OLIGO: AAA
COUNT: 1
FREQ: 0.3333
---
OLIGO: AAT
COUNT: 1
FREQ: 0.3333
---
OLIGO: ATG
COUNT: 1
FREQ: 0.3333
---
SEQ: AAATG
SEQ_NAME: test1
SEQ_LEN: 5
---
OLIGO: AAA
COUNT: 1
FREQ: 0.3333
---
OLIGO: GAA
COUNT: 1
FREQ: 0.3333
---
OLIGO: TGA
COUNT: 1
FREQ: 0.3333
---
SEQ: TGAAA
SEQ_NAME: test2
SEQ_LEN: 5
---
The result is an oligo frequency of the oligoes found in each sequence.
To get a total frequency instead, use the -a
switch:
read_fasta -i test.fna | oligo_freq -w 3 -a
SEQ: AAATG
SEQ_NAME: test1
SEQ_LEN: 5
---
SEQ: TGAAA
SEQ_NAME: test2
SEQ_LEN: 5
---
OLIGO: AAA
COUNT: 2
FREQ: 0.3333
---
OLIGO: AAT
COUNT: 1
FREQ: 0.1667
---
OLIGO: ATG
COUNT: 1
FREQ: 0.1667
---
OLIGO: GAA
COUNT: 1
FREQ: 0.1667
---
OLIGO: TGA
COUNT: 1
FREQ: 0.1667
---
Or to get a nice table, first grab:
read_fasta -i test.fna | oligo_freq -w 3 -a | grab -p OLIGO -K | write_tab -cx
#OLIGO COUNT FREQ
AAA 2 0.3333
AAT 1 0.1667
ATG 1 0.1667
GAA 1 0.1667
TGA 1 0.1667
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
oligo_freq is part of the Biopieces framework.