Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: merge_pair_seq

Description

merge_pair_seq merges paired sequences in the stream, if these are interleaved. Sequence names must be in either Illumina1.3/1.5 format trailing a /1 or /2 or Illumina1.8 containing 1: or 2:. Sequence names must match accordingly in order to merge sequences.

An example record:

SEQ_LEN_RIGHT: 15
SEQ_LEN_LEFT: 15
SCORES: <???9?BBBDBDDBDDFFFFFFHHHIFHFH
SEQ: TAGGGAATCTTGCACAATGGAGGAAACTCT
SEQ_LEN: 30
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
---

Usage

... | merge_pair_seq [options]

Options

[-?          | --help]               #  Print full usage description.
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following FASTQ entries in the file test.fq:

@M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14
TGGGGAATATTGGACAATGG
+
<??????BDDDDDDDDGGGG
@M01168:16:000000000-A1R9L:1:1101:14862:1868 2:N:0:14
CCTGTTTGCTACCCACGCTT
+
?????BB<-<BDDDDDFEEF
@M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
TAGGGAATCTTGCACAATGG
+
<???9?BBBDBDDBDDFFFF
@M01168:16:000000000-A1R9L:1:1101:13906:2139 2:N:0:14
ACTCTTCGCTACCCATGCTT
+
,5<??BB?DDABDBDDFFFF
@M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14
TAGGGAATCTTGCACAATGG
+
?????BBBBBDDBDDBFFFF
@M01168:16:000000000-A1R9L:1:1101:14865:2158 2:N:0:14
CCTCTTCGCTACCCATGCTT
+
??,<??B?BB?BBBBBFF?F

To merge these interleaved pair-end sequences use merge_pair_seq:

read_fastq -e base_33 -i test.fq | merge_pair_seq

SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14
SEQ: TGGGGAATATTGGACAATGGCCTGTTTGCTACCCACGCTT
SEQ_LEN: 40
SCORES: <??????BDDDDDDDDGGGG?????BB<-<BDDDDDFEEF
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGGACTCTTCGCTACCCATGCTT
SEQ_LEN: 40
SCORES: <???9?BBBDBDDBDDFFFF,5<??BB?DDABDBDDFFFF
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGGCCTCTTCGCTACCCATGCTT
SEQ_LEN: 40
SCORES: ?????BBBBBDDBDDBFFFF??,<??B?BB?BBBBBFF?F
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---

See also

read_fastq

join_seq

Author

Martin Asser Hansen - Copyright (C) - All rights reserved. Vera Carvalho - Copyright (C) - All rights reserved.

[email protected]

March 2013

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

merge_pair_seq is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally