-
Notifications
You must be signed in to change notification settings - Fork 7
/
README
202 lines (164 loc) · 9.13 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
Welcome to AMOS Assembler. A Modular Open Source Assembler.
website: <http://amos.sourceforge.net>
contact: <[email protected]>
AMOS is open source software. Please see the `COPYING' file for
details. For documentation, please refer to the Documentation
section. For building and installation instructions please see the
Installation section.
Brief Summary
=============
The AMOS consortium is committed to the development of open-source
whole genome assembly software. The project acronym (AMOS) represents
our primary goal -- to produce A Modular, Open-Source whole genome
assembler. Open-source so that everyone is welcome to contribute and
help build outstanding assembly tools, and modular in nature so that
new contributions can be easily inserted into an existing assembly
pipeline. This modular design will foster the development of new
assembly algorithms and allow the AMOS project to continually grow and
improve in hopes of eventually becoming a widely accepted and deployed
assembly infrastructure. In this sense, AMOS is both a design
philosophy and a software system.
Because of its modular nature, AMOS cannot be described in one
paragraph since it is a composite of many different systems. See the
Pipeline section for quick descriptions of each pipeline, or the
Documentation section on where to find comprehensive documentation.
Installation
============
Please follow the instructions in the `INSTALL' file for building
and installation. The `INSTALL' file is only a generic installation
document, so any AMOS specific installation notes are listed below.
Once `make install' is issued, the installed scripts and Perl
modules must not be moved. Thus, convince yourself of the installation
directory beforehand and set the $PREFIX variable for `configure'
accordingly. AMOS makes use of a few custom installation directories
not mentioned in `INSTALL', they are as follows. $datadir, $libdir and
$includedir are set by `configure', see `configure --help' for more
information.
- Documentation will be installed in $datadir/doc/amos-[version]
- AMOS libraries and Perl modules will be installed in $libdir/AMOS
- TIGR libraries and Perl modules will be installed in $libdir/TIGR
- AMOS headers will be installed in $includedir/AMOS
- TIGR headers will be installed in $includedir/TIGR
In addition, certain parts of the AMOS package require the X
windowing system along with the Qt libraries. Parameters for these
packages may be modified with the following options:
--x-includes=DIR X include files are in DIR
--x-libraries=DIR X library files are in DIR
--with-x use the X Window System
--with-Boost-dir=DIR Directory in which to find the ./boost folder for the
Boost toolkit
--with-qmake-qt4=DIR Path to qmake using Qt version 4.x
Please see the INSTALL file for more information and Cygwin and OS X
specific instructions.
Dependencies
============
The AMOS package makes use of Python and Perl (Practical Extraction
and Report Language). Python and Perl are available on most systems
and the latest versions can be downloaded free of charge. AMOS
requires Perl version 5.6.0 or later. If `perl' or `python' are
available from your system PATH, all is well, if not you will need to
instruct `configure' where they are located by setting the
environment variable PERL and PYTHON to the full path of `perl' and
`python' respectively (see the Defining Variables section in the
`INSTALL' file). Some Perl scripts in the AMOS package require
additional modules that you should install:
DBI (http://search.cpan.org/~timb/DBI/)
Statistics::Descriptive (http://search.cpan.org/~shlomif/Statistics-Descriptive-3.0100/)
XML::Parser (http://search.cpan.org/~msergeant/XML-Parser-2.36/)
AMOScmp and minimus2 make use of the NUCmer whole genome alignment
utility which is part of the MUMmer package. If you wish to run AMOScmp,
you will need download and install MUMmer (available for free from
<http://mummer.sourceforge.net>). If `nucmer' is available from
your system PATH, all is well, if not you will need to instruct
`configure' where it is located by setting the environment variable
NUCMER to the full path of `nucmer' (see the Defining Variables
section in the `INSTALL' file).
The validation pipeline `amosvalidate' also depends on Nucmer, but
it is not essential to the validation process. If you wish to run
amosvalidate without the "alignment breakpoints" step, comment out
steps 600-710 with '#' in the `amosvalidate' script.
The minimus2-blat pipeline relies on BLAT instead of nucmer.
The AMOS assembly viewer Hawkeye relies on X windows and the Qt4
toolkit. Note that the Qt toolkit is somewhat volatile, so it is
important to install Qt4 and not a newer or older version. If Qt is not
centrally installed on your system, you will need to do so to make
use of this graphical viewer. The configure script should be able
to identify most standard Qt installations, however you may need
to set the `--with-qmake-qt4' configure option to the location of
your Qt package. If all else fails, you may have to build the
viewer independently. If so, make AMOS without hawkeye, and then
cd into the src/hawkeye directory, run 'qmake', and then 'make'.
Finally, if you need the toAmos_new file conversion tool, you will
need to have the Expat XML parsing library: http://expat.sourceforge.net
Documentation
=============
Basic documentation for the individual pipelines is located in the
`doc' subdirectory. Comprehensive documentation can be found on
the AMOS website <http://amos.sourceforge.net>.
Pipeline
========
Listed here are the current AMOS pipelines available. Source files
for the pipelines are in `src/Pipeline' and will be installed in
$bindir as executable scripts. Please see individual documentation for
each pipeline in the `doc' directory.
- minimus -
Minimus is an assembly pipeline designed specifically for small
data-sets, such as the set of reads covering a specific gene. Note
that the code will work for larger assemblies (we have used it to
assemble bacterial genomes), however, due to its stringency, the
resulting assembly will be highly fragmented. For large and/or
complex assemblies the execution of Minimus should be followed by
additional processing steps, such as scaffolding.
- Minimo -
Minimo uses the same assembly strategy as minimo but offers more
flexibility in the sequence input, output and processing.
- minimus2
minimus2 is an assembly pipeline designed for merging two sequence sets
(Example: the contigs generated by two assembly processes)
It uses a nucmer based overlap detector instead of the hash-overlap program
used by the minimus pipeline.
- minimus2-blat
This pipeline performs the same function as minimus2, but uses BLAT instead
of nucmer for the alignments, which provides a speedup.
- AMOScmp -
AMOScmp provides a general overlap-layout-consensus pipeline for
assembly, but with a twist. The overlap phase of the process is
replaced with an alignment to a reference, i.e. all sequencing reads
are aligned to a finished reference sequence and their alignments are
used to determine their layout position.
- AMOScmp-shortReads -
Modified version AMOScmp for assembling short reads
Differences compared to AMOScmp:
* smaller nucmer alignment cluster size (20 vs 65)
* smaller make-consensus alignment wiggle value (2 vs 15)
- AMOScmp-shortReads-alignmentTrimmed -
Very similar to AMOScmp-shortReads but it does a reference based alignment
trimming of the reads prior to the assembly.
Differences compared to AMOScmp-shortReads:
* aligns the reads to reference using nucmer
* determines zero coverage regions
* extracts the read clear ranges from the alignment(delta) file
* exrtends the read clear ranges for reads adjacent to zero coverage regions
* updates the bank with the new clear ranges
* updates the alignment(delta) file with the new read lengths and clear ranges
- goBambus2 -
goBambus2 is a pipeline to drive the Bambus2 modules, designed to
scaffold polymorphic and metagenomic data
- amosvalidate -
Amosvalidate is a validation pipeline for genome assemblies. This
pipeline includes a collection of methods for ascertaining the
quality of an assembly, and examines multiple measures of assembly
quality to pinpoint potential mis-assemblies. Validation techniques
include mate-pair validation, repeat analysis, coverage analysis,
identification of correlated read polymorphisms, and read alignment
breakpoint analysis. Regions of the assembly exhibiting multiple
signatures of mis-assembly are flagged as suspicious and output by
amosvalidate for further examination.
- hawkeye -
Hawkeye is a visual analytics tool for genome assembly analysis and
validation, designed to aid in identifying and correcting assembly
errors. Hawkeye blends the best practices from information and
scientific visualization to facilitate inspection of large-scale
assembly data while minimizing the time needed to detect
mis-assemblies and make accurate judgments of assembly quality.
August 2011