-
Notifications
You must be signed in to change notification settings - Fork 23
split_pair_seq
Martin Asser Hansen edited this page Oct 2, 2015
·
6 revisions
split_pair_seq. Sequence names must be in
either Illumina1.3/1.5 format trailing a /1
or /2
or Illumina1.8 containing 1:
or 2:
. A sequence split into
two will be output as two records where the first will be named with 1
and the second with 2
.
... | split_pair_seq [options]
[-? | --help] # Print full usage description.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following Biopiece records created with merge_pair_seq:
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14
SEQ: TGGGGAATATTGGACAATGGCCTGTTTGCTACCCACGCTT
SEQ_LEN: 40
SCORES: <??????BDDDDDDDDGGGG?????BB<-<BDDDDDFEEF
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGGACTCTTCGCTACCCATGCTT
SEQ_LEN: 40
SCORES: <???9?BBBDBDDBDDFFFF,5<??BB?DDABDBDDFFFF
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGGCCTCTTCGCTACCCATGCTT
SEQ_LEN: 40
SCORES: ?????BBBBBDDBDDBFFFF??,<??B?BB?BBBBBFF?F
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---
These can be split using split_pair_seq:
... merge_pair_seq | split_pair_seq
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14
SEQ: TGGGGAATATTGGACAATGG
SEQ_LEN: 20
SCORES: <??????BDDDDDDDDGGGG
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14862:1868 2:N:0:14
SEQ: CCTGTTTGCTACCCACGCTT
SEQ_LEN: 20
SCORES: ?????BB<-<BDDDDDFEEF
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGG
SEQ_LEN: 20
SCORES: <???9?BBBDBDDBDDFFFF
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:13906:2139 2:N:0:14
SEQ: ACTCTTCGCTACCCATGCTT
SEQ_LEN: 20
SCORES: ,5<??BB?DDABDBDDFFFF
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGG
SEQ_LEN: 20
SCORES: ?????BBBBBDDBDDBFFFF
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14865:2158 2:N:0:14
SEQ: CCTCTTCGCTACCCATGCTT
SEQ_LEN: 20
SCORES: ??,<??B?BB?BBBBBFF?F
---
Martin Asser Hansen - Copyright (C) - All rights reserved. Vera Carvalho - Copyright (C) - All rights reserved.
March 2013
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
split_pair_seq is part of the Biopieces framework.