-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Hello
I am somewhat confused by the output by geneitc_diff()
, in particular the number of alleles in each population.
Given the example data below:
library(vcfR)
data(vcfR_example)
myPops <- as.factor(rep(c('a','b'), each = 9))
myDiff <- genetic_diff(vcf, myPops, method = "nei")
head(myDiff)
Output:
CHROM | POS | Hs_a | Hs_b | Ht | n_a | n_b | Gst | Htmax | Gstmax | Gprimest |
---|---|---|---|---|---|---|---|---|---|---|
Supercontig_1.50 | 2 | 0.0000000 | 0.1800000 | 0.07986111 | 14 | 10 | 0.06086957 | 0.5173611 | 0.8550336 | 0.07118968 |
Supercontig_1.50 | 246 | 0.4200000 | 0.2777778 | 0.35123967 | 10 | 12 | 0.02509804 | 0.6652893 | 0.4853002 | 0.05171652 |
Supercontig_1.50 | 549 | 0.5000000 | 0.0000000 | 0.44444444 | 4 | 2 | 0.25000000 | 0.6666667 | 0.5000000 | 0.50000000 |
Supercontig_1.50 | 668 | 0.4444444 | 0.0000000 | 0.50000000 | 12 | 4 | 0.33333333 | 0.6250000 | 0.4666667 | 0.71428571 |
Supercontig_1.50 | 765 | 0.0000000 | 0.2187500 | 0.11072664 | 18 | 16 | 0.07031250 | 0.5467128 | 0.8117089 | 0.08662281 |
Supercontig_1.50 | 780 | 0.0000000 | 0.2448980 | 0.12444444 | 16 | 14 | 0.08163265 | 0.5511111 | 0.7926267 | 0.10299003 |
To get more information about the oubject vcf
I converted to a genInd object.
GenInd <- vcfR2genind(vcf)
GenInd
/// GENIND OBJECT /////////
// 18 individuals; 2,533 loci; 5,083 alleles; size: 1.9 Mb
// Basic content
@tab: 18 x 5083 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 1-5)
@loc.fac: locus factor for the 5083 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: adegenet::df2genind(X = t(x), sep = sep)
// Optional content
- empty -
# What is range of number of alleles
n_perPop <- as.data.frame(t(myDiff[,c(6:7)]))
n_perPop$Site <- gsub("n_", "", rownames(n_perPop))
nPerpop.m <- reshape2::melt(n_perPop, id.vars = "Site")
colnames(nPerpop.m) <- c("Site", "SNP_number", "n_alleles")
head(nPerpop.m)
range(nPerpop.m$n_alleles)
#0-18
So from this additional information I can see that there is a range of 1-5 alleles per locus and that the data is diploid.
Therefore, from my understanding Supercontig_1.50:549 (CHROM:POS) has 2 alleles and it is 4 because (2x2(diploid)=4). Is this description correct?
If so, I am confused with my vcf output
GenInd
/// GENIND OBJECT /////////
// 761 individuals; 3,857 loci; 7,714 alleles; size: 24.8 Mb
// Basic content
@tab: 761 x 7714 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 2-2)
@loc.fac: locus factor for the 7714 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: adegenet::df2genind(X = t(x), sep = sep)
// Optional content
@pop: population of each individual (group size range: 5-20)
# Get range of n for my vcf
n_perPop <- as.data.frame(t(myDiff[,c(49:93)]))
n_perPop$Site <- gsub("n_", "", rownames(n_perPop))
nPerpop.m <- melt(n_perPop, id.vars = "Site")
colnames(nPerpop.m) <- c("Site", "SNP_number", "n_alleles")
range(nPerpop.m$n_alleles)
# 0 40
If my vcf has a max range of 2 alleles per locus and is a diploid organism then surely the max number of alleles should be 4?? However the range of number of alleles is 0-40??
My appologies if this is a simplistic question, I am new to this sort of analysis.
Any advice to shed light on this would be greatly appreciated.
Thank you