Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 5 revisions

Biopiece: analyze_tags

Description

analyze_tags creates a sequence tag length and clone count distribution. The distribution consists of three columns or record keys:

  • TAG_LEN
  • TAG_COUNT
  • TAG_CLONES

The TAG_LEN is either the SEQ_LEN or BED_LEN depending on the record type. The TAG_COUNT is the number of tags with a given tab length. THE CLONE_COUNT is sum of clones for a given TAG_LEN. The CLONE_COUNT for each tag is the last number following a _ in the SEQ_NAME or Q_ID e.g. GPL4738_GSM154618_4_8 has a clone count of 8.

Usage

... | analyze_tags [options]

Options

[-?         | --help]               #  Print full usage description.
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to file         -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

Consider the following FASTA entries in the file `test.fna':

>GPL4738_GSM154618_1_1
TGCTTGGACTACATATGGTTGAGGGTTGTA
>GPL4738_GSM154618_2_2
TAATACTGTCAGGTAAAGATGTC
>GPL4738_GSM154618_3_1
TGCTTGGACTACATATGGTTGAGGG
>GPL4738_GSM154618_4_8
TGAGTATTACATCAGGTACTGGT
>GPL4738_GSM154618_5_4
CTGCTTGGACTACATATGGTTGAGGGTTGTA
>GPL4738_GSM154618_6_3
CTAAGGAAATAGTAGCCGTGAT
>GPL4738_GSM154618_7_3
TATCACAGCCATTTTGACGAGTT
>GPL4738_GSM154618_8_2
TACGCAGAGGCCTAAGTAAATAGTC
>GPL4738_GSM154618_9_2
TCACTGGGCTTTGTTTATCTCA
>GPL4738_GSM154618_10_2
TATCACAGCCAGCTTTGATGAGCT

To read the sequences use read_fasta and write the output with write_tab:

read_fasta -i test.fna | analyze_tags | write_tab -cxk TAG_LEN,TAG_COUNT,TAG_CLONES

#TAG_LEN        TAG_COUNT       TAG_CLONES
22      2       66
23      3       328
24      1       23
25      2       220
30      1       1250
31      1       41

Or consider the following BED entries in the file test.bed:

chr2L   20309439        20309467        GPL6817_GSM286603_15_1  33      +
chr2L   354181  354209  GPL6817_GSM286603_15_1  33      +
chr2L   12940128        12940156        GPL6817_GSM286603_15_1  33      +
chr2L   10162601        10162629        GPL6817_GSM286603_15_1  33      +
chr2L   19737747        19737771        GPL6817_GSM286603_16_1  14      +
chr2L   6563165 6563188 GPL6817_GSM286603_17_1  1       +
chr2L   22259021        22259046        GPL6817_GSM286603_18_6  14      +
chr2L   8601299 8601326 GPL6817_GSM286603_19_2  145     +
chr2L   8594716 8594743 GPL6817_GSM286603_19_2  145     +
chr2L   16160570        16160597        GPL6817_GSM286603_19_2  145     +

To read the BED entries use read_bed and write the output with write_tab;

read_bed -i test.bed | analyze_tags | write_tab -cxk TAG_LEN,TAG_COUNT,TAG_CLONES

#TAG_LEN        TAG_COUNT       TAG_CLONES
23      1       1
24      1       1
25      1       6
27      3       6
28      4       4

See also

read_fasta

read_bed

write_tab

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

analyze_tags is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally