Skip to content
Martin Asser Hansen edited this page Oct 1, 2015 · 6 revisions

#summary Order records with pair end sequence data.

Biopiece: order_pairs

Description

[order_pairs] order records with pair end sequence data where the sequence names are either using the Illuina 1.5 scheme where names end on /1 or /2 or the Illumina 1.8 scheme where The names contain a space followed by 1 or 2 and then a :. The records are output in inter leaved order - which is required for pair-end aware assembly programs. [order_pairs] uses a hashing scheme for this and does not sort according to sequence name.

Using [order_pairs] is important after filtering steps where one record of a pair may have been discarded. For each record the value to the ORDER key denotes if the record was paired or the record was orphan and you can use [grab] to filter the records accordingly.

SEQ_NAME: HWI-ST575:107:C0HE6ACXX:5:1101:1832:2218 1:N:0:TAGCTG
SEQ: GCTTTGACATAGTCGCTCCAGAATTGCCAGCTAGGGTTAGCTTGGCAACTGCAGCGACGTAATGTGCTGTGGCAGATCAATTTATCTGTTTTGAATCA
SEQ_LEN: 98
SCORES: ^P^PJ\Y`eea`e[daYdecggadgdXJIYVbdc`efg_cdedI^aXIO^abeb\eL_daQU^_V]``]UGTZ\^BBBBBBBBBBBBBBBBBBBBBBB
ORDER: paired
---
SEQ_NAME: HWI-ST575:107:C0HE6ACXX:5:1101:1832:2218 2:N:0:TAGCTG
SEQ: GGTTATCGATCTGGAAAAAGCAACTAAACCTAAAGCTAAACCACGTAGCGCCGGGTAAATGATTCAAAACAGATAAATTGATCTGCCACAGCACATTA
SEQ_LEN: 98
SCORES: ^VYPJQ`c^JJ[b[efg^dHJ`aa`adXd_ZXXbIIIY[af_H^aWHWPZ[`gggFFZ^bd_Z]Zb_]ba\^ZGY_`TZ``cc[bbR]]^aaXQ[bbb
ORDER: paired
---
SCORES: ffffcfffffded^eddddddbdcdeedcefecfefdffecabccBB`b`
SEQ: CCNAGGAGGAGNCAATAAGAGACCATTCGTATATGATCTCTCAGGAGAGC
SEQ_LEN: 50
SEQ_NAME: ILLUMINA-52179E_0004:2:1:1044:7943#TTAGGC/1
ORDER: orphan 1
---
SCORES: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
SEQ: NNNNNNNNGGNNCNANNANNNNGTNNNTNGNANNNNCNNANTTGNNNNNN
SEQ_LEN: 50
SEQ_NAME: ILLUMINA-52179E_0004:2:1:1041:14486#TTAGGC/2
ORDER: orphan 2
---

Usage

... | order_pairs [options]

Options

[-?         | --help]               #  Print full usage description.
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file   -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to stream file   -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

If you have two pair-end sequence files with the Illumina 1.5 or 1.8 scheme of naming pairs then you can order these with [order_pairs] simply by doing:

read_fastq -i test1.fq,test2.fq | order_pairs | write_fastq -o combi.fq -x

If you filter your sequences and discard a member of a pairs, you can run the data through [order_pairs] to discard any unmatched records:

read_fastq -i combi.fq |            # Read in Illumina data
trim_seq |                          # Trim ends according to quality scores
grab -e "SEQ_LEN>30" |              # Remove entries with sequence shorter than 30
order_pairs |                       # Make sure the pairs are in order
grab -p 'pair' -k ORDER |           # Grab paired records
write_fastq -o combi_trimmed.fq -x  # Write to new file

See also

[read_fastq]

[write_fastq]

[trim_seq]

[grab]

[assemble_seq_idba]

[assemble_seq_velvet]

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

May 2011

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

[order_pairs] is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally