Phylogenomic inference for viruses using whole-genome distance metrics.
ViPhySlim is an open-source Python package for viral phylogenetic inference based on whole-genome comparisons. It is inspired by the VICTOR tool, providing a lightweight, scalable, and locally deployable alternative. ViPhySlim is designed to handle a large number of viral genomes, leveraging parallel computation via MPI.
Inferring the evolutionary relationships of viruses is a challenging task. Unlike cellular organisms, viral evolution is heavily influenced by horizontal gene transfer, making traditional gene-based phylogenetics less reliable. Genome-wide approaches offer better accuracy, but existing tools have limitations:
- Closed-source
- Web-only interface
- Limited to 100 genomes per analysis
ViPhySlim addresses these challenges by offering:
- Local and scalable phylogenetic analysis
- Parallel processing with MPI
- Open-source Python implementation
- Conda-packaged for easy installation and reproducibility
- Pairwise genome distance computation
- MPI-based parallel processing for high-throughput scalability
- Output compatible with phylogenetic tree building tools
- Easily deployable via
venvor Conda (BioConda channel coming soon)
ViPhySlim implements whole-genome distance estimation using the Genome BLAST Distance Phylogeny (GBDP) approach. After computing BLASTp alignments between translated viral genomes, distances are calculated using one of three formulas:
- d0: Proportion of the genome covered by high-scoring segment pairs (HSPs).
- d4: Fraction of identical amino acid pairs within HSPs relative to the total aligned length. This measure is more robust when working with incomplete genomes.
- d6: Fraction of identical amino acid pairs relative to the entire genome size. This metric preserves the highest amount of evolutionary information.
The choice of distance depends on the dataset characteristics and the research focus.
By default, d6 is recommended for complete genomes, while d4 is more reliable for fragmented or incomplete assemblies.
git clone https://github.com/ErillLab/ViPhySlim.git
cd ViPhySlimconda env create -f viphyslim_environment.yml
conda activate viphyslimType of execution: "parallel":
mpiexec -np <num_processes> python main.pympiexec -np 5 python main.pyType of execution: "serial":
python main.pyNote: When running the code, make sure the execution mode (parallel or serial) corresponds to the setting chosen in the configuration file.
This project is licensed under the MIT License. See the LICENSE file for details.
If you use ViPhySlim in your research, please cite this repository.