Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: align_seq

Description

align_seq creates an alignment of all sequences in the stream.

align_seq currently uses Muscle as alignment engine, and Muscle must be installed in order for align_seq to work.

For more about Muscle:

http://www.drive5.com/muscle/

Usage

... | align_seq [options]

Options

[-?         | --help]               #  Print full usage description.
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to stream file -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

Consider the following file test.fna containing these FASTA entries:

>test1
CTAGCTTCGACT
>test2
GAATCGACT
>test3
ACGAAACTAGCATC
>test4
AGCATCGACT
>test5
TAACAGGCACT

In order to align these sequences read the file with read_fasta and pipe the stream to align_seq:

read_fasta -i test.fna | align_seq

SEQ: ---TAACAGGCACT
SEQ_LEN: 14
SEQ_NAME: test5
---
SEQ: -----GAATCGACT
SEQ_LEN: 14
SEQ_NAME: test2
---
SEQ: --CTAGCTTCGACT
SEQ_LEN: 14
SEQ_NAME: test1
---
SEQ: ACGAAACTAGCATC
SEQ_LEN: 14
SEQ_NAME: test3
---
SEQ: ----AGCATCGACT
SEQ_LEN: 14
SEQ_NAME: test4
---

The resulting alignment can then be written in FASTA format using write_fasta:

read_fasta -i test.fna | align_seq | write_fasta -x

>test5
---TAACAGGCACT
>test2
-----GAATCGACT
>test1
--CTAGCTTCGACT
>test3
ACGAAACTAGCATC
>test4
----AGCATCGACT

Or you can write the alignment in pretty text format using write_align:

read_fasta -i test.fna | align_seq | write_align -x

                          .    
test5            ---TAACAGGCACT
test2            -----GAATCGACT
test1            --CTAGCTTCGACT
test3            ACGAAACTAGCATC
test4            ----AGCATCGACT
Consensus: 50%   ----A-C----ACT

If there is only two aligned sequence in the stream, write_align will output a pairwise alignment in pretty text:

read_fasta -i test.fna -n 2 | align_seq | write_align -x

                     .  
test1       CTAGCTTCGACT
               |  ||||||
test2       ---GAATCGACT

See also

read_fasta

write_fasta

write_align

invert_align

tile_seq

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

align_seq is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally