-
Notifications
You must be signed in to change notification settings - Fork 23
assemble_pairs
assemble_pairs assembles overlapping pair-end sequences into single sequences that are output to the stream -
the orginal sequences are not output. Assembly works by progressively considering all overlaps between the
maximum considered overlap using the -p
switch (default is the length of the shortest sequence) until the
minimum required overlap supplied with the -o
switch (default 1). For each overlap a percentage of mismatches
can be allowed using the -m
switch (default 5%).
Mismatches in the overlapping regions are resolved so that the residues with the highest quality score is used in the assembled sequence. The quality scores are averaged in the overlapping region. The sequence of the overlapping region is output in upper case and the remaining in lower case.
Paired sequences must follow the Illuina 1.5 scheme where names end on /1 or /2 or the Illumina 1.8 scheme
where the names contain a space followed by 1
or 2
and then a :
. Futhermore, sequences must be in
interleaved order in the stream - use read_fastq for that.
The additional keys are added to records with merged sequences:
- OVERLAP_LEN - the length of the located overlap.
- HAMMING_DIST - the number of mismatches in the assembly.
... | assemble_pairs [options]
[-? | --help] # Print full usage description.
[-m <uint> | --mismatches=<uint> # Allowed overlap mismatches in percent - Default=5
[-o <uint> | --overlap_min=<uint> # Minimum overlap require - Default=1
[-p <uint> | --overlap_max=<uint> # Minimum overlap considered - Default=(length of shortest sequence)
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
If you have two pair-end sequence files with the Illumina 1.5 or 1.8 scheme of naming pairs then you can assemble these using assemble_pairs like this:
read_fastq -i in1.fq -j in2.fq | assemble_pairs | write_fastq -o out.fq -x
Martin Asser Hansen - Copyright (C) - All rights reserved.
March 2013
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
assemble_pairs is part of the Biopieces framework.