afwdist

An implementation of the pairwise distance metric between groups of genetic variants, based on differences in fixed and non-fixed allele frequencies, described in Álvarez-Herrera & Sevilla et al. (2024) (see also CITATION.cff).

Briefly, we define the difference between two vectors of $J$ allele frequencies such that the distance between two samples $M$ and $N$ is the sum for all $I$ polymorphic sites of the differences between the frequency of an allele $j$ at each site $i$:

$$d (M,N) = \sum_{i = 1}^{I} \frac{\sum_{j = 1}^{J} {({{M_{ij}} - {N_{ij}}})}^2} {4 - \sum_{j = 1}^{J} {({{M_{ij}} + {N_{ij}}})}^2}$$

Usage

Quick reference

Usage: afwdist [OPTIONS] --input <INPUT> --reference <REFERENCE> --output <OUTPUT>

Options:
  -i, --input <INPUT>          Input tree in CSV format (mandatory CSV columns are 'sample', 'position', 'sequence' and 'frequency')
  -r, --reference <REFERENCE>  Reference sequence in FASTA format
  -o, --output <OUTPUT>        Output CSV file with distances between each pair of samples
  -s, --include-reference      Include reference as a sample with 100% fixed alleles
  -v, --verbose                Enable debug messages
  -h, --help                   Print help
  -V, --version                Print version

Inputs and outputs

The program takes as input a table in CSV format (possibly derived from a VCF file) where each row represents a single genetic variant. The input table must contain four columns:

sample (a string): a unique identifier for the group of variants used in pairwise comparisons.
position (an integer): the site of the variant.
sequence (a string): the sequence of the variant (i.e. the alternate allele).
frequency (a real number from 0 to 1): the relative frequency of the variant within the sample.

In addition to the variant table, the program requires a reference sequence in FASTA format. The sequence should be the same one used for variant calling. This reference is used to infer the frequencies of reference alleles, assuming that any frequency not taken up by listed variants belongs to the reference allele at that site. In addition to the pairwise distance between samples, the distance between each sample and the reference sequence can also calculated (if requested) by building a reference sample as a baseline with no variant alleles (i.e. all sites are assumed to have an allele frequency of 1).

The distance of each sample is calculated against the reference as well, treating it as a normal sample with no allele variants (all reference allele frequencies are fixed within the reference virtual sample).

As a result, a table in CSV format is produced. This table contains three columns:

sample_m and sample_n (strings): the identifiers of the two samples being compared.
distance (a real number): the calculated pairwise distance between the two samples.

Citation

Álvarez-Herrera, M. & Sevilla, J., Ruiz-Rodriguez, P., Vergara, A., Vila, J., Cano-Jiménez, P., González-Candelas, F., Comas, I., & Coscollá, M. (2024). VIPERA: Viral Intra-Patient Evolution Reporting and Analysis. Virus Evolution, 10(1), veae018. https://doi.org/10.1093/ve/veae018

Contributors

Thanks goes to these wonderful people (emoji key):

_{Miguel Álvarez Herrera}
💻

_{Jordi Sevilla Fortuny}
🐛 📓

This project follows the all-contributors specification.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
src		src
test		test
.all-contributorsrc		.all-contributorsrc
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

afwdist

Usage

Quick reference

Inputs and outputs

Citation

Contributors

About

Uh oh!

Releases 1

Packages

Languages

License

PathoGenOmics-Lab/afwdist

Folders and files

Latest commit

History

Repository files navigation

afwdist

Usage

Quick reference

Inputs and outputs

Citation

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages