clarify some sentences in README

pdawyndt · Felix Van der Jeugt · commit 3057aa8cfd84 · 2021-08-09T15:10:35.000+02:00
diff --git a/README.md b/README.md
@@ -12,7 +12,7 @@ and extract it somewhere in your path.
 ### From source
 
 FragGeneScanRs is written in Rust, so first head over to their
-[installation instructions][Rust]. After, clone this repository or
+[installation instructions][Rust]. Afterwards, clone this repository or
 download the source code of the [latest release][release]. In this
 directory, run `cargo install --path .` to install. The installation
 progress may prompt you to add a directory to your path so you can
@@ -23,8 +23,8 @@ easily execute it.
 ## Usage
 
 You can use FragGeneScanRs with the short options of FragGeneScan but
-it also provides some additional options and long-form options. It
-reads from and writes to standard input and standard output by default,
+it also provides long-form options and some additional options. It
+reads from standard input and writes to and standard output by default,
 allowing shorter calls in case you only need the predicted proteins.
 
 ```sh
@@ -43,13 +43,14 @@ FragGeneScanRs -s seq_file_name -o output_file_name -w [0 or 1] -t train_file_na
 
 where:
 
-* `seq_file_name` is the (FASTA) sequence file name including the full
-  path
+* `seq_file_name` is the absolute path for the FASTA file containing DNA
+  sequences that need to undergo gene prediction
 
-* `output_file_name` is the base name for the 3 outputfiles, including
-  the full path. A `.out`, `.faa` and `.ffn` file will be created
-  containing the gene prediction metadata, the predicted proteins, and
-  the corresponding DNA reads.
+* `output_file_name` is the absolute path and prefix for the three
+  output files. Files with extensions `.out`, `.faa` and `.ffn` will be
+  created, respectively containing the gene prediction metadata, protein
+  translations of predicted genes, and the DNA sequences of predicted
+  genes.
 
 * `0 or 1` for short sequence reads or complete genomic sequences.
 
@@ -64,7 +65,7 @@ where:
   - `illumina_5` for Illumina sequencing reads with about 0.5% error rate
   - `illumina_10` for Illumina sequencing reads with about 1% error rate
 
-  The corresponding file should be in the `train` directory below the
+  The corresponding file should be in the subdirectory `train` of the
   working directory. Other files can be added and selected here.
 
 * `num_threads` is the number of threads to be used. Defaults to 1.
@@ -80,15 +81,16 @@ where:
   FragGeneScanRs to only write the predicted proteins to standard output.
   The other files can still be requested with the specific options above.
 
-* Leaving out the `-s` options causes FragGeneScanRs to read the
-  sequences from standard input.
+* Leaving out the `-s` options causes FragGeneScanRs to read sequences
+  from standard input.
 
-* `-r train_file_dir` can change the directory containing the training
-  files, so you can put it anywhere on your system.
+* `-r train_file_dir` allows to explicitly specify the pathname of
+  the directory containing the training files, so you can execute the
+  command anywhere on your system.
 
-* `-u` can be used for some additional speed when using multithreading. The
-  output will no longer be in the same order as the input (as in FGS and
-  FGS+).
+* The option `-u` can be used for some additional speed and reduced
+  memory when using multithreading. The output will no longer be in the
+  same order as the input (as in FGS and FGS+).
 
 The complete list of options will be printed when running
 `FragGeneScanRs --help`.
@@ -147,7 +149,7 @@ The commands and arguments used for this benchmarks were:
 
 By default, FragGeneScanPlus outputs only the predicted genes, not the
 metadata and DNA files. Below are measurements taken when those files
-aren't output by FragGeneScanRs either.
+aren't generated by FragGeneScanRs either.
 
 | Short reads      |  1 thread | 2 threads | 4 threads | 8 threads |
 |:-----------------|----------:|----------:|----------:|----------:|
@@ -163,6 +165,15 @@ The commands used here are:
 
 ## Memory usage
 
-Above command were also used to measure memory usage.
+The figure below shows the memory footprint for multithreaded execution
+of FGS, FGS+ and FGSrs on long reads (1328 bp). Total memory footprint
+(heap, stack and memory-mapped file I/O) is measured using the Massif
+heap profiler of Valgrind with the `--pages-as-heap` option. Race
+conditions consistently halt the execution of FGS+ above 10 threads. FGS
+and FGSrs generate DNA sequences, protein translations and metadata,
+whereas FGS+ only generates protein translations because the software
+crashes when other output is generated. FGS and FGS+ report gene
+predictions out-of-order, where default in-order reporting was used for
+FGSrs.
 
 ![memory usage](meta/memory-usage-wrapped.png)