Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: split_pair_seq

Description

split_pair_seq. Sequence names must be in either Illumina1.3/1.5 format trailing a /1 or /2 or Illumina1.8 containing 1: or 2:. A sequence split into two will be output as two records where the first will be named with 1 and the second with 2.

Usage

... | split_pair_seq [options]

Options

[-?          | --help]               #  Print full usage description.
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following Biopiece records created with merge_pair_seq:

SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14
SEQ: TGGGGAATATTGGACAATGGCCTGTTTGCTACCCACGCTT
SEQ_LEN: 40
SCORES: <??????BDDDDDDDDGGGG?????BB<-<BDDDDDFEEF
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGGACTCTTCGCTACCCATGCTT
SEQ_LEN: 40
SCORES: <???9?BBBDBDDBDDFFFF,5<??BB?DDABDBDDFFFF
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGGCCTCTTCGCTACCCATGCTT
SEQ_LEN: 40
SCORES: ?????BBBBBDDBDDBFFFF??,<??B?BB?BBBBBFF?F
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---

These can be split using split_pair_seq:

... merge_pair_seq | split_pair_seq

SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14
SEQ: TGGGGAATATTGGACAATGG
SEQ_LEN: 20
SCORES: <??????BDDDDDDDDGGGG
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14862:1868 2:N:0:14
SEQ: CCTGTTTGCTACCCACGCTT
SEQ_LEN: 20
SCORES: ?????BB<-<BDDDDDFEEF
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGG
SEQ_LEN: 20
SCORES: <???9?BBBDBDDBDDFFFF
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:13906:2139 2:N:0:14
SEQ: ACTCTTCGCTACCCATGCTT
SEQ_LEN: 20
SCORES: ,5<??BB?DDABDBDDFFFF
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGG
SEQ_LEN: 20
SCORES: ?????BBBBBDDBDDBFFFF
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14865:2158 2:N:0:14
SEQ: CCTCTTCGCTACCCATGCTT
SEQ_LEN: 20
SCORES: ??,<??B?BB?BBBBBFF?F
---

See also

read_fastq

merge_pair_seq

join_seq

Author

Martin Asser Hansen - Copyright (C) - All rights reserved. Vera Carvalho - Copyright (C) - All rights reserved.

[email protected]

March 2013

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

split_pair_seq is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally