@@ -12,7 +12,7 @@ and extract it somewhere in your path.
1212### From source
1313
1414FragGeneScanRs is written in Rust, so first head over to their
15- [ installation instructions] [ Rust ] . After , clone this repository or
15+ [ installation instructions] [ Rust ] . Afterwards , clone this repository or
1616download the source code of the [ latest release] [ release ] . In this
1717directory, run ` cargo install --path . ` to install. The installation
1818progress may prompt you to add a directory to your path so you can
@@ -23,8 +23,8 @@ easily execute it.
2323## Usage
2424
2525You can use FragGeneScanRs with the short options of FragGeneScan but
26- it also provides some additional options and long-form options. It
27- reads from and writes to standard input and standard output by default,
26+ it also provides long-form options and some additional options. It
27+ reads from standard input and writes to and standard output by default,
2828allowing shorter calls in case you only need the predicted proteins.
2929
3030``` sh
@@ -43,13 +43,14 @@ FragGeneScanRs -s seq_file_name -o output_file_name -w [0 or 1] -t train_file_na
4343
4444where:
4545
46- * ` seq_file_name ` is the (FASTA) sequence file name including the full
47- path
46+ * ` seq_file_name ` is the absolute path for the FASTA file containing DNA
47+ sequences that need to undergo gene prediction
4848
49- * ` output_file_name ` is the base name for the 3 outputfiles, including
50- the full path. A ` .out ` , ` .faa ` and ` .ffn ` file will be created
51- containing the gene prediction metadata, the predicted proteins, and
52- the corresponding DNA reads.
49+ * ` output_file_name ` is the absolute path and prefix for the three
50+ output files. Files with extensions ` .out ` , ` .faa ` and ` .ffn ` will be
51+ created, respectively containing the gene prediction metadata, protein
52+ translations of predicted genes, and the DNA sequences of predicted
53+ genes.
5354
5455* ` 0 or 1 ` for short sequence reads or complete genomic sequences.
5556
6465 - ` illumina_5 ` for Illumina sequencing reads with about 0.5% error rate
6566 - ` illumina_10 ` for Illumina sequencing reads with about 1% error rate
6667
67- The corresponding file should be in the ` train ` directory below the
68+ The corresponding file should be in the subdirectory ` train ` of the
6869 working directory. Other files can be added and selected here.
6970
7071* ` num_threads ` is the number of threads to be used. Defaults to 1.
@@ -80,15 +81,16 @@ where:
8081 FragGeneScanRs to only write the predicted proteins to standard output.
8182 The other files can still be requested with the specific options above.
8283
83- * Leaving out the ` -s ` options causes FragGeneScanRs to read the
84- sequences from standard input.
84+ * Leaving out the ` -s ` options causes FragGeneScanRs to read sequences
85+ from standard input.
8586
86- * ` -r train_file_dir ` can change the directory containing the training
87- files, so you can put it anywhere on your system.
87+ * ` -r train_file_dir ` allows to explicitly specify the pathname of
88+ the directory containing the training files, so you can execute the
89+ command anywhere on your system.
8890
89- * ` -u ` can be used for some additional speed when using multithreading. The
90- output will no longer be in the same order as the input (as in FGS and
91- FGS+).
91+ * The option ` -u ` can be used for some additional speed and reduced
92+ memory when using multithreading. The output will no longer be in the
93+ same order as the input (as in FGS and FGS+).
9294
9395The complete list of options will be printed when running
9496` FragGeneScanRs --help ` .
@@ -147,7 +149,7 @@ The commands and arguments used for this benchmarks were:
147149
148150By default, FragGeneScanPlus outputs only the predicted genes, not the
149151metadata and DNA files. Below are measurements taken when those files
150- aren't output by FragGeneScanRs either.
152+ aren't generated by FragGeneScanRs either.
151153
152154| Short reads | 1 thread | 2 threads | 4 threads | 8 threads |
153155| :-----------------| ----------:| ----------:| ----------:| ----------:|
@@ -163,6 +165,15 @@ The commands used here are:
163165
164166## Memory usage
165167
166- Above command were also used to measure memory usage.
168+ The figure below shows the memory footprint for multithreaded execution
169+ of FGS, FGS+ and FGSrs on long reads (1328 bp). Total memory footprint
170+ (heap, stack and memory-mapped file I/O) is measured using the Massif
171+ heap profiler of Valgrind with the ` --pages-as-heap ` option. Race
172+ conditions consistently halt the execution of FGS+ above 10 threads. FGS
173+ and FGSrs generate DNA sequences, protein translations and metadata,
174+ whereas FGS+ only generates protein translations because the software
175+ crashes when other output is generated. FGS and FGS+ report gene
176+ predictions out-of-order, where default in-order reporting was used for
177+ FGSrs.
167178
168179![ memory usage] ( meta/memory-usage-wrapped.png )
0 commit comments