Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 5 revisions

Biopiece: count_vals

Description

Given a comma seperated list of keys count_vals for each of these keys counts the number of identical values. Since the count basically is dependant on one hash per key, count_vals have the potential to blow the memory quite easily. This is countered by caching the count to disk for every 5 million records, however, the disk caching may be slow.

Usage

... | count_vals [options]

Options

[-?          | --help]               #  Print full usage description.
[-k <string> | --keys=<string>]      #  Comma separeted list of keys.
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following two column table in the file test.tab:

Human   H1
Human   H2
Human   H3
Dog     D1
Dog     D2
Mouse   M1

To count the values of both columns we first read the table with read_tab:

read_tab -i test.tab | count_vals -k V0,V1

V0: Human
V1_COUNT: 1
V1: H1
V0_COUNT: 3
---
V0: Human
V1_COUNT: 1
V1: H2
V0_COUNT: 3
---
V0: Human
V1_COUNT: 1
V1: H3
V0_COUNT: 3
---
V0: Dog
V1_COUNT: 1
V1: D1
V0_COUNT: 2
---
V0: Dog
V1_COUNT: 1
V1: D2
V0_COUNT: 2
---
V0: Mouse
V1_COUNT: 1
V1: M1
V0_COUNT: 1
---

The result is that for each of the specified keys (V0 and V1) a new key with the suffix COUNT is added where the value is the global count. The result is better displayed after piping through write_tab:

read_tab -i test.tab | count_vals -k V0,V1 | write_tab -xck V0,V0_COUNT,V1,V1_COUNT

#V0     V0_COUNT    V1      V1_COUNT
Human   3           H1      1
Human   3           H2      1
Human   3           H3      1
Dog     2           D1      1
Dog     2           D2      1
Mouse   1           M1      1

See also

read_tab

write_tab

uniq_vals

grab

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

count_vals is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally