Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 5 revisions

Biopiece: add_ident

Description

For records without an identifier, or where the identifier is non-unique, a new identifier can be added with add_ident - or existing identifiers can be replaced.

Usage

... | add_ident [options]

Options

[-?           | --help]               #  Print full usage description.
[-k <string>  | --key=<string>]       #  Identifier key                -  Default=ID
[-p <string>  | --prefix=<string>]    #  Identifier prefix             -  Default=ID
[-o <uint>    | --offset=<uint>]      #  Identifier offset             -  Default=0
[-I <file!>   | --stream_in=<file>!]  #  Read input from stream file   -  Default=STDIN
[-O <file>    | --stream_out=<file>]  #  Write output to stream file   -  Default=STDOUT
[-v           | --verbose]            #  Verbose output.

Examples

Consider the following table:

Organism   Sequence    Count
Human      ATACGTCAG   23524
Dog        AGCATGAC    2442
Mouse      GACTG       234
Cat        AAATGCA     2342

We use read_tab to get the Sequence column, and the add_ident to generate a unique identifier for each record:

read_tab -i test.tab -s 1 -c 1 -k SEQ | add_ident

ID: ID00000000
SEQ: ATACGTCAG
---
ID: ID00000001
SEQ: AGCATGAC
---
ID: ID00000002
SEQ: GACTG
---
ID: ID00000003
SEQ: AAATGCA
---

However, if you want to output the sequence with write_fasta then the new key ID can be replaced with the required SEQ_NAME using the -k switch:

read_tab -i test.tab -s 1 -c 1 -k SEQ | add_ident -k SEQ_NAME

SEQ: ATACGTCAG
SEQ_NAME: ID00000000
---
SEQ: AGCATGAC
SEQ_NAME: ID00000001
---
SEQ: GACTG
SEQ_NAME: ID00000002
---
SEQ: AAATGCA
SEQ_NAME: ID00000003
---

If you want to change the format of the identifier, the prefix can be changed with the -p switch:

read_tab -i test.tab -s 1 -c 1 -k SEQ | add_ident -k SEQ_NAME -p ID_

SEQ: ATACGTCAG
SEQ_NAME: ID_00000000
---
SEQ: AGCATGAC
SEQ_NAME: ID_00000001
---
SEQ: GACTG
SEQ_NAME: ID_00000002
---
SEQ: AAATGCA
SEQ_NAME: ID_00000003
---

Finally, if you also want change the offset of the identifier from the default 0

  • use the -o switch:
read_tab -i test.tab -s 1 -c 1 -k SEQ | add_ident -k SEQ_NAME -p ID_ -o 5

SEQ: ATACGTCAG
SEQ_NAME: ID_00000005
---
SEQ: AGCATGAC
SEQ_NAME: ID_00000006
---
SEQ: GACTG
SEQ_NAME: ID_00000007
---
SEQ: AAATGCA
SEQ_NAME: ID_00000008
---

See also

read_tab

write_fasta

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

add_ident is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally