popcbenchmark_manuscript
contains the workflow for analyzing data produced by our benchmark ClavelLab/pocpbenchmark
. We set out to compare proteins alignment tools for improved genus delineation using the Percentage Of Conserved Proteins (POCP).
A preprint of our work is available at bioRxiv:
Robust genome-based delineation of bacterial genera. Charlie Pauvert, Thomas C.A. Hitch, Thomas Clavel. bioRxiv 2025.03.17.643616; doi: https://doi.org/10.1101/2025.03.17.643616
These analyses were conducted in R 4.3.1 and in Rstudio. We recommend setting up R and specific versions using rig
, and getting Rstudio from Posit. We also use renv
for reproducible environment, which can be installed in R with install.packages("renv")
.
- Open Rstudio and create a new project via "File > New Project..."
- Select "Version Control" and then "Git"
- Type
https://github.com/ClavelLab/pocpbenchmark_manuscript
in Repository URL. - Make sure the project is going to be created in the correct subdirectory on your computer, or else edit accordingly
- Click on "Create project"
- Type
If you comfortable with the command line and git, clone the repository either with SSH or HTTPS in a suitable location.
- Rstudio warns you that
One or more packages recorded in the lockfile are not installed
because a couple of R packages and dependencies are needed.- Install the dependencies by typing
renv::restore()
in the Console and agree to the installation of the packages. - Check that all dependencies are set by typing
renv::status()
in the Console where you should haveNo issues found
- Install the dependencies by typing
Our analysis workflow is orchestrated by targets
and is composed of two subworkflows.
Note
You can skip to the next section if you want to start the workflow from already prepared files!
- Download the raw output files from the workflow using the "Download all" button: https://doi.org/10.5281/zenodo.14974869
- Uncompress the zip archive within your project
- Create a
data_benchmark
folder within your project. - Move all the zip files downloaded from zenodo (
benchmark-gtdb-f__*.zip
) todata_benchmark
. - Ensure the two csv files are at the root of your project.
- Run the workflow with the following command:
Sys.setenv(TAR_PROJECT = "prepare_pocpbenchmark_data")
targets::tar_make()
If you skipped the first workflow, you need to download the cleaned and formatted POCP/POCPu values and metadata tables for analysis from https://doi.org/10.5281/zenodo.14975029. These are the files you would have generated with the previous section.
- Run the workflow with the following command:
Sys.setenv(TAR_PROJECT = "analyze_pocpbenchmark_data")
targets::tar_make()
The manuscript is then available in the _manuscript
folder, both as a HTML document (index.html
) and a docx document. The figures are generated in the figures
folder.