Description
Hi there!
I work with ecological data, in particular microbial ecology, and we cannot use Euclidean distance for comparing community dissimilarities (either using cluster analysis, PCoA, or NMDS) since Euclidean dissimilarities perform poorly when datasets have many zeroes, which is almost always the case with microbial sequencing data. We tend to use Bray-Curtis dissimilarity (also known as percentage-similarity) which is semi-metric and does not obey the triangle-inequality theorem. Would genieclust not work for this type of dissimilairty matrix?
Also, when I used genie clust on my environmental data, which is fine to use Euclidean distances for since it does not have double-zeroes, the branch height was very different from the original Euclidean pairwise distances shown in the output matrix. i.e., it showed groups had more Euclidean similarity than the original input matrix, while hierarchical clustering with "average" linkage tended to show the original values more accurately. See below:
Standard hierarchical clustering with average linkage
Snapshot of original Euclidean dissimilarity matrix (notice that most pairwise dissimilarities are greater than 1, but the genie dendrogram shows most the branch lengths are around 1)
Thank you for your help,
Mike