-
Notifications
You must be signed in to change notification settings - Fork 23
analyze_tags
analyze_tags creates a sequence tag length and clone count distribution. The distribution consists of three columns or record keys:
- TAG_LEN
- TAG_COUNT
- TAG_CLONES
The TAG_LEN is either the SEQ_LEN or BED_LEN depending on the record type.
The TAG_COUNT is the number of tags with a given tab length. THE CLONE_COUNT is sum of clones for a given TAG_LEN.
The CLONE_COUNT for each tag is the last number following a _
in the SEQ_NAME
or Q_ID e.g. GPL4738_GSM154618_4_8 has a clone count of 8.
... | analyze_tags [options]
[-? | --help] # Print full usage description.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTA entries in the file `test.fna':
>GPL4738_GSM154618_1_1
TGCTTGGACTACATATGGTTGAGGGTTGTA
>GPL4738_GSM154618_2_2
TAATACTGTCAGGTAAAGATGTC
>GPL4738_GSM154618_3_1
TGCTTGGACTACATATGGTTGAGGG
>GPL4738_GSM154618_4_8
TGAGTATTACATCAGGTACTGGT
>GPL4738_GSM154618_5_4
CTGCTTGGACTACATATGGTTGAGGGTTGTA
>GPL4738_GSM154618_6_3
CTAAGGAAATAGTAGCCGTGAT
>GPL4738_GSM154618_7_3
TATCACAGCCATTTTGACGAGTT
>GPL4738_GSM154618_8_2
TACGCAGAGGCCTAAGTAAATAGTC
>GPL4738_GSM154618_9_2
TCACTGGGCTTTGTTTATCTCA
>GPL4738_GSM154618_10_2
TATCACAGCCAGCTTTGATGAGCT
To read the sequences use read_fasta and write the output with write_tab:
read_fasta -i test.fna | analyze_tags | write_tab -cxk TAG_LEN,TAG_COUNT,TAG_CLONES
#TAG_LEN TAG_COUNT TAG_CLONES
22 2 66
23 3 328
24 1 23
25 2 220
30 1 1250
31 1 41
Or consider the following BED entries in the file test.bed
:
chr2L 20309439 20309467 GPL6817_GSM286603_15_1 33 +
chr2L 354181 354209 GPL6817_GSM286603_15_1 33 +
chr2L 12940128 12940156 GPL6817_GSM286603_15_1 33 +
chr2L 10162601 10162629 GPL6817_GSM286603_15_1 33 +
chr2L 19737747 19737771 GPL6817_GSM286603_16_1 14 +
chr2L 6563165 6563188 GPL6817_GSM286603_17_1 1 +
chr2L 22259021 22259046 GPL6817_GSM286603_18_6 14 +
chr2L 8601299 8601326 GPL6817_GSM286603_19_2 145 +
chr2L 8594716 8594743 GPL6817_GSM286603_19_2 145 +
chr2L 16160570 16160597 GPL6817_GSM286603_19_2 145 +
To read the BED entries use read_bed and write the output with write_tab;
read_bed -i test.bed | analyze_tags | write_tab -cxk TAG_LEN,TAG_COUNT,TAG_CLONES
#TAG_LEN TAG_COUNT TAG_CLONES
23 1 1
24 1 1
25 1 6
27 3 6
28 4 4
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
analyze_tags is part of the Biopieces framework.