-
Notifications
You must be signed in to change notification settings - Fork 23
merge_pair_seq
Martin Asser Hansen edited this page Oct 2, 2015
·
6 revisions
merge_pair_seq merges paired sequences in the stream, if these are interleaved. Sequence names must be in
either Illumina1.3/1.5 format trailing a /1
or /2
or Illumina1.8 containing 1:
or 2:
. Sequence names
must match accordingly in order to merge sequences.
An example record:
SEQ_LEN_RIGHT: 15
SEQ_LEN_LEFT: 15
SCORES: <???9?BBBDBDDBDDFFFFFFHHHIFHFH
SEQ: TAGGGAATCTTGCACAATGGAGGAAACTCT
SEQ_LEN: 30
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
---
... | merge_pair_seq [options]
[-? | --help] # Print full usage description.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTQ entries in the file test.fq
:
@M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14
TGGGGAATATTGGACAATGG
+
<??????BDDDDDDDDGGGG
@M01168:16:000000000-A1R9L:1:1101:14862:1868 2:N:0:14
CCTGTTTGCTACCCACGCTT
+
?????BB<-<BDDDDDFEEF
@M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
TAGGGAATCTTGCACAATGG
+
<???9?BBBDBDDBDDFFFF
@M01168:16:000000000-A1R9L:1:1101:13906:2139 2:N:0:14
ACTCTTCGCTACCCATGCTT
+
,5<??BB?DDABDBDDFFFF
@M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14
TAGGGAATCTTGCACAATGG
+
?????BBBBBDDBDDBFFFF
@M01168:16:000000000-A1R9L:1:1101:14865:2158 2:N:0:14
CCTCTTCGCTACCCATGCTT
+
??,<??B?BB?BBBBBFF?F
To merge these interleaved pair-end sequences use merge_pair_seq:
read_fastq -e base_33 -i test.fq | merge_pair_seq
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14
SEQ: TGGGGAATATTGGACAATGGCCTGTTTGCTACCCACGCTT
SEQ_LEN: 40
SCORES: <??????BDDDDDDDDGGGG?????BB<-<BDDDDDFEEF
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGGACTCTTCGCTACCCATGCTT
SEQ_LEN: 40
SCORES: <???9?BBBDBDDBDDFFFF,5<??BB?DDABDBDDFFFF
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---
SEQ_NAME: M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14
SEQ: TAGGGAATCTTGCACAATGGCCTCTTCGCTACCCATGCTT
SEQ_LEN: 40
SCORES: ?????BBBBBDDBDDBFFFF??,<??B?BB?BBBBBFF?F
SEQ_LEN_LEFT: 20
SEQ_LEN_RIGHT: 20
---
Martin Asser Hansen - Copyright (C) - All rights reserved. Vera Carvalho - Copyright (C) - All rights reserved.
March 2013
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
merge_pair_seq is part of the Biopieces framework.