Understanding genetic_diff() number of alleles output

Hello

I am somewhat confused by the output by `geneitc_diff()`, in particular the number of alleles in each population.

Given the example data below:

```
library(vcfR)
data(vcfR_example)

myPops <- as.factor(rep(c('a','b'), each = 9))
myDiff <- genetic_diff(vcf, myPops, method = "nei")

head(myDiff)
```

Output:
|            CHROM | POS |      Hs_a |      Hs_b |         Ht|  n_a|  n_b |        Gst |     Htmax|     Gstmax|    Gprimest| 
| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: |
| Supercontig_1.50 |   2|  0.0000000 | 0.1800000 |  0.07986111 |  14 |  10 | 0.06086957 | 0.5173611  | 0.8550336 | 0.07118968 | 
| Supercontig_1.50 | 246 |  0.4200000 |  0.2777778 |  0.35123967 |  10|   12 | 0.02509804 | 0.6652893 | 0.4853002 |  0.05171652 | 
| Supercontig_1.50 | 549 | 0.5000000 | 0.0000000 | 0.44444444   | 4   | 2 | 0.25000000 | 0.6666667 | 0.5000000 | 0.50000000 | 
| Supercontig_1.50 | 668 | 0.4444444 | 0.0000000 | 0.50000000 |  12  |  4 | 0.33333333 |  0.6250000 | 0.4666667 | 0.71428571 | 
| Supercontig_1.50 | 765 | 0.0000000 | 0.2187500 | 0.11072664  | 18  | 16 | 0.07031250 |  0.5467128 | 0.8117089 | 0.08662281 | 
| Supercontig_1.50 | 780 | 0.0000000 | 0.2448980 | 0.12444444  | 16  | 14 | 0.08163265 | 0.5511111 |  0.7926267 | 0.10299003 | 

To get more information about the oubject `vcf` I converted to a genInd object.

```
GenInd <- vcfR2genind(vcf)
GenInd

/// GENIND OBJECT /////////

 // 18 individuals; 2,533 loci; 5,083 alleles; size: 1.9 Mb

 // Basic content
   @tab:  18 x 5083 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 1-5)
   @loc.fac: locus factor for the 5083 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 2-2)
   @type:  codom
   @call: adegenet::df2genind(X = t(x), sep = sep)

 // Optional content
   - empty -

# What is range of number of alleles
n_perPop <- as.data.frame(t(myDiff[,c(6:7)]))
n_perPop$Site <- gsub("n_", "", rownames(n_perPop))
nPerpop.m <- reshape2::melt(n_perPop, id.vars = "Site")
colnames(nPerpop.m) <- c("Site", "SNP_number", "n_alleles")
head(nPerpop.m)
range(nPerpop.m$n_alleles)
#0-18
```

So from this additional information I can see that there is a range of 1-5 alleles per locus and that the data is diploid.
Therefore, from my understanding Supercontig_1.50:549 (CHROM:POS) has 2 alleles and it is 4 because (2x2(diploid)=4). Is this description correct?

If so, I am confused with my vcf output

```
GenInd
/// GENIND OBJECT /////////

 // 761 individuals; 3,857 loci; 7,714 alleles; size: 24.8 Mb

 // Basic content
   @tab:  761 x 7714 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 2-2)
   @loc.fac: locus factor for the 7714 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 2-2)
   @type:  codom
   @call: adegenet::df2genind(X = t(x), sep = sep)

 // Optional content
   @pop: population of each individual (group size range: 5-20)

# Get range of n for my vcf
n_perPop <- as.data.frame(t(myDiff[,c(49:93)]))
n_perPop$Site <- gsub("n_", "", rownames(n_perPop))
nPerpop.m <- melt(n_perPop, id.vars = "Site")
colnames(nPerpop.m) <- c("Site", "SNP_number", "n_alleles")

range(nPerpop.m$n_alleles)
# 0 40

```

If my vcf has a max range of 2 alleles per locus and is a diploid organism then surely the max number of alleles should be 4?? However the range of number of alleles is 0-40??

My appologies if this is a simplistic question, I am new to this sort of analysis.
Any advice to shed light on this would be greatly appreciated.

Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding genetic_diff() number of alleles output #185

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CHROM	POS	Hs_a	Hs_b	Ht	n_a	n_b	Gst	Htmax	Gstmax	Gprimest
Supercontig_1.50	2	0.0000000	0.1800000	0.07986111	14	10	0.06086957	0.5173611	0.8550336	0.07118968
Supercontig_1.50	246	0.4200000	0.2777778	0.35123967	10	12	0.02509804	0.6652893	0.4853002	0.05171652
Supercontig_1.50	549	0.5000000	0.0000000	0.44444444	4	2	0.25000000	0.6666667	0.5000000	0.50000000
Supercontig_1.50	668	0.4444444	0.0000000	0.50000000	12	4	0.33333333	0.6250000	0.4666667	0.71428571
Supercontig_1.50	765	0.0000000	0.2187500	0.11072664	18	16	0.07031250	0.5467128	0.8117089	0.08662281
Supercontig_1.50	780	0.0000000	0.2448980	0.12444444	16	14	0.08163265	0.5511111	0.7926267	0.10299003

Understanding genetic_diff() number of alleles output #185

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions