Skip to content

Commit 3057aa8

Browse files
pdawyndtFelix Van der Jeugt
authored andcommitted
clarify some sentences in README
1 parent 6db7791 commit 3057aa8

File tree

1 file changed

+30
-19
lines changed

1 file changed

+30
-19
lines changed

README.md

Lines changed: 30 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ and extract it somewhere in your path.
1212
### From source
1313

1414
FragGeneScanRs is written in Rust, so first head over to their
15-
[installation instructions][Rust]. After, clone this repository or
15+
[installation instructions][Rust]. Afterwards, clone this repository or
1616
download the source code of the [latest release][release]. In this
1717
directory, run `cargo install --path .` to install. The installation
1818
progress may prompt you to add a directory to your path so you can
@@ -23,8 +23,8 @@ easily execute it.
2323
## Usage
2424

2525
You can use FragGeneScanRs with the short options of FragGeneScan but
26-
it also provides some additional options and long-form options. It
27-
reads from and writes to standard input and standard output by default,
26+
it also provides long-form options and some additional options. It
27+
reads from standard input and writes to and standard output by default,
2828
allowing shorter calls in case you only need the predicted proteins.
2929

3030
```sh
@@ -43,13 +43,14 @@ FragGeneScanRs -s seq_file_name -o output_file_name -w [0 or 1] -t train_file_na
4343

4444
where:
4545

46-
* `seq_file_name` is the (FASTA) sequence file name including the full
47-
path
46+
* `seq_file_name` is the absolute path for the FASTA file containing DNA
47+
sequences that need to undergo gene prediction
4848

49-
* `output_file_name` is the base name for the 3 outputfiles, including
50-
the full path. A `.out`, `.faa` and `.ffn` file will be created
51-
containing the gene prediction metadata, the predicted proteins, and
52-
the corresponding DNA reads.
49+
* `output_file_name` is the absolute path and prefix for the three
50+
output files. Files with extensions `.out`, `.faa` and `.ffn` will be
51+
created, respectively containing the gene prediction metadata, protein
52+
translations of predicted genes, and the DNA sequences of predicted
53+
genes.
5354

5455
* `0 or 1` for short sequence reads or complete genomic sequences.
5556

@@ -64,7 +65,7 @@ where:
6465
- `illumina_5` for Illumina sequencing reads with about 0.5% error rate
6566
- `illumina_10` for Illumina sequencing reads with about 1% error rate
6667

67-
The corresponding file should be in the `train` directory below the
68+
The corresponding file should be in the subdirectory `train` of the
6869
working directory. Other files can be added and selected here.
6970

7071
* `num_threads` is the number of threads to be used. Defaults to 1.
@@ -80,15 +81,16 @@ where:
8081
FragGeneScanRs to only write the predicted proteins to standard output.
8182
The other files can still be requested with the specific options above.
8283

83-
* Leaving out the `-s` options causes FragGeneScanRs to read the
84-
sequences from standard input.
84+
* Leaving out the `-s` options causes FragGeneScanRs to read sequences
85+
from standard input.
8586

86-
* `-r train_file_dir` can change the directory containing the training
87-
files, so you can put it anywhere on your system.
87+
* `-r train_file_dir` allows to explicitly specify the pathname of
88+
the directory containing the training files, so you can execute the
89+
command anywhere on your system.
8890

89-
* `-u` can be used for some additional speed when using multithreading. The
90-
output will no longer be in the same order as the input (as in FGS and
91-
FGS+).
91+
* The option `-u` can be used for some additional speed and reduced
92+
memory when using multithreading. The output will no longer be in the
93+
same order as the input (as in FGS and FGS+).
9294

9395
The complete list of options will be printed when running
9496
`FragGeneScanRs --help`.
@@ -147,7 +149,7 @@ The commands and arguments used for this benchmarks were:
147149

148150
By default, FragGeneScanPlus outputs only the predicted genes, not the
149151
metadata and DNA files. Below are measurements taken when those files
150-
aren't output by FragGeneScanRs either.
152+
aren't generated by FragGeneScanRs either.
151153

152154
| Short reads | 1 thread | 2 threads | 4 threads | 8 threads |
153155
|:-----------------|----------:|----------:|----------:|----------:|
@@ -163,6 +165,15 @@ The commands used here are:
163165

164166
## Memory usage
165167

166-
Above command were also used to measure memory usage.
168+
The figure below shows the memory footprint for multithreaded execution
169+
of FGS, FGS+ and FGSrs on long reads (1328 bp). Total memory footprint
170+
(heap, stack and memory-mapped file I/O) is measured using the Massif
171+
heap profiler of Valgrind with the `--pages-as-heap` option. Race
172+
conditions consistently halt the execution of FGS+ above 10 threads. FGS
173+
and FGSrs generate DNA sequences, protein translations and metadata,
174+
whereas FGS+ only generates protein translations because the software
175+
crashes when other output is generated. FGS and FGS+ report gene
176+
predictions out-of-order, where default in-order reporting was used for
177+
FGSrs.
167178

168179
![memory usage](meta/memory-usage-wrapped.png)

0 commit comments

Comments
 (0)