EVclades is an Enterovirus-adapted version of Art Poon’s fluclades pipeline for automated clade assignment. This version has been customized for Enterovirus D68 (EV-D68), using phylogenies and metadata to define robust, reproducible clades based on tree structure and sequence divergence.
Sequence and metadata files (sequences.fasta, metadata.tsv, reference.gbk, etc.) were loaded using the Nextclade D68 ingest pipeline.
-
Snakefile– Manages the workflow, calling all relevant scripts and tools.Integrates Nextstrain commands including
augur index,augur filter,nextclade3 run(for alignment),augur tree, andaugur refine, alongside the custom Python and R scripts below. -
relabel-fasta.py– Replaces FASTA headers using a CSV generated by the filtering step and RIVM subgenotype annotations. -
compress-seqs.py– Removes exact duplicate sequences from FASTA input, retaining the first occurrence and writing duplicates to a CSV for traceability. -
subtyping.py– Implements nodewise clustering by calculating divergence and patristic distances at internal nodes to assign sequences to clades. -
chainsaw.py– Python script for edgewise clustering based on internal branch lengths. Requires Biopython.-
Run with no arguments to print a histogram of branch lengths.
-
Use
CUTOFFto define a threshold for subtree cutting. -
Use
FORMATto select output format:summary(default),tree(a set of Newick tree strings) orlabels(CSV listing tip-to-subtree assignments).Also computes normalized mutual information between subtree assignments and known subtype labels.
-
-
auto-chainsaw.py– Automateschainsaw.pyruns across a range of cutoffs to explore clustering behavior.Used to generate data for Figures 2A and 3A. Input trees are reconstructed with FastTree2; outputs are written to stdout in CSV format.
-
plot-trees.R– Usesggfreeto visualize full EV-D68 phylogenies, with branch coloring based on clade/subtype assignments. -
chainsaw-plot.R– Plots the number of subtrees produced bychainsaw.pyas a function of the internal branch length cutoff.Helps visualize parameter sensitivity for EV-D68 protein phylogenies.
-
coldates.R– Generates a barplot of EV-D68 sequence deposition by year. -
subtree-grid.R– Produces grid-based summary figures to visually compare subtree clustering results across different parameters.