Skip to content

hurwitzlab/phage_detection_benchmarks

Repository files navigation

challenging-phage-finders

A benchmarking study comparing various phage finding tools through carefully crafted mock communities.

Restoring R environment:

Rscript -e "install.packages('renv')"
Rscript -e "renv::restore()"

Overview of Experiments

Below are descriptions of the experiments and a broad overview of the pipelines. More detailed graphs of the pipelines can be found in src/.

Fragmented Genomes

Complete genomes from RefSeq are fragmented into lengths of 500, 1000, 3000, and 5000 nt. These fragments are passed to the contig classifiers, and their performance is assessed.

graph LR
    refseq[(RefSeq)] -- genomes --> chopper;
    subgraph fragmentation
        chopper
    end
    chopper -- fragments --> tools{classifiers};
    subgraph classification
        tools
    end
    tools -- predictions --> assess{analysis}
    subgraph assessment
        assess --> results(results)
    end
Loading

The pipelines that constitute this pipeline are:

  • src/chop_refseq/Snakefile
  • src/select_frags/Snakefile
  • src/classify_chopped/Snakefile

Simulated Metagenomes

Abundance profiles created by Bracken from various samples are used to create simulated metagenomes with InSilicoSeq. These are assembled with MegaHit, and binned using MetaBAT2. The contigs from MegaHit are given to the contig classifiers and the bins from MetaBAT2 are passed to MARVEL for predictions. For assessment, the contigs are mapped to the input genomes using BLAST.

graph LR
    profiles(abundance profiles) --> iss{InSilicoSeq};
    genomes[(genomes)] --> iss;
    subgraph simulation
        iss -- reads --> megahit{MegaHit};
        megahit -- contigs --> blast;
        megahit -- contigs --> metabat{MetaBAT2};
    end
    genomes --> blast{BLAST};
    subgraph classification
        metabat -- bins --> marvel{MARVEL};
        megahit -- contigs --> tools{classifiers};
    end
    subgraph assessment
        tools -- predictions --> assess{analysis}
        marvel -- predictions --> assess
        blast -- mappings --> assess
        assess --> results(results)
    end
Loading

The pipelines that constitute this pipeline are:

  • src/simulate_metagenomes/Snakefile
  • src/classify_simulated_metagenomes/Snakefile

About

Code used to benchmark metagenomic phage detection tools.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published