analyze_tags

Biopiece: analyze_tags

Description

analyze_tags creates a sequence tag length and clone count distribution. The distribution consists of three columns or record keys:

TAG_LEN
TAG_COUNT
TAG_CLONES

The TAG_LEN is either the SEQ_LEN or BED_LEN depending on the record type. The TAG_COUNT is the number of tags with a given tab length. THE CLONE_COUNT is sum of clones for a given TAG_LEN. The CLONE_COUNT for each tag is the last number following a _ in the SEQ_NAME or Q_ID e.g. GPL4738_GSM154618_4_8 has a clone count of 8.

Usage

... | analyze_tags [options]

Options

[-?         | --help]               #  Print full usage description.
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to file         -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

Consider the following FASTA entries in the file `test.fna':

>GPL4738_GSM154618_1_1
TGCTTGGACTACATATGGTTGAGGGTTGTA
>GPL4738_GSM154618_2_2
TAATACTGTCAGGTAAAGATGTC
>GPL4738_GSM154618_3_1
TGCTTGGACTACATATGGTTGAGGG
>GPL4738_GSM154618_4_8
TGAGTATTACATCAGGTACTGGT
>GPL4738_GSM154618_5_4
CTGCTTGGACTACATATGGTTGAGGGTTGTA
>GPL4738_GSM154618_6_3
CTAAGGAAATAGTAGCCGTGAT
>GPL4738_GSM154618_7_3
TATCACAGCCATTTTGACGAGTT
>GPL4738_GSM154618_8_2
TACGCAGAGGCCTAAGTAAATAGTC
>GPL4738_GSM154618_9_2
TCACTGGGCTTTGTTTATCTCA
>GPL4738_GSM154618_10_2
TATCACAGCCAGCTTTGATGAGCT

To read the sequences use read_fasta and write the output with write_tab:

read_fasta -i test.fna | analyze_tags | write_tab -cxk TAG_LEN,TAG_COUNT,TAG_CLONES

#TAG_LEN        TAG_COUNT       TAG_CLONES
22      2       66
23      3       328
24      1       23
25      2       220
30      1       1250
31      1       41

Or consider the following BED entries in the file test.bed:

chr2L   20309439        20309467        GPL6817_GSM286603_15_1  33      +
chr2L   354181  354209  GPL6817_GSM286603_15_1  33      +
chr2L   12940128        12940156        GPL6817_GSM286603_15_1  33      +
chr2L   10162601        10162629        GPL6817_GSM286603_15_1  33      +
chr2L   19737747        19737771        GPL6817_GSM286603_16_1  14      +
chr2L   6563165 6563188 GPL6817_GSM286603_17_1  1       +
chr2L   22259021        22259046        GPL6817_GSM286603_18_6  14      +
chr2L   8601299 8601326 GPL6817_GSM286603_19_2  145     +
chr2L   8594716 8594743 GPL6817_GSM286603_19_2  145     +
chr2L   16160570        16160597        GPL6817_GSM286603_19_2  145     +

To read the BED entries use read_bed and write the output with write_tab;

read_bed -i test.bed | analyze_tags | write_tab -cxk TAG_LEN,TAG_COUNT,TAG_CLONES

#TAG_LEN        TAG_COUNT       TAG_CLONES
23      1       1
24      1       1
25      1       6
27      3       6
28      4       4

Author

[email protected]

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

analyze_tags is part of the Biopieces framework.

http://www.biopieces.org

analyze_tags

Biopiece: analyze_tags

Description

Usage

Options

Examples

See also

Author

License

Help

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!