A new correlation coefficient (Chatterjee) #314

vincentarelbundock · 2024-04-11T14:31:16Z

Have not read yet, but this looks fun: https://arxiv.org/pdf/1909.10140.pdf

bwiernik · 2024-04-12T02:12:34Z

Neat! Looks straightforward!

mattansb · 2024-04-14T10:42:09Z

ksaai <- function(X, Y, ties = TRUE){
  n <- length(X)
  r <- rank(Y[order(X)], ties.method = "random")
  set.seed(42)
  if(ties){
    l <- rank(Y[order(X)], ties.method = "max")
    return( 1 - n*sum( abs(r[-1] - r[-n]) ) / (2*sum(l*(n - l))) )
  } else {
    return( 1 - 3 * sum( abs(r[-1] - r[-n]) ) / (n^2 - 1) )    
  }
}

I don't like that it's not symmetrical - shouldn't correlation coefficients be symmetrical?

x <- rnorm(100, sd = 4)
y <- sin(x) + rnorm(100, sd = 0.2)

plot(x, y)

ksaai(x, y)
#> [1] 0.6306631
ksaai(y, x)
#> [1] -0.1710171

Also the maximal value isn't 1 and seems to depend on the sample size?

z10 <- runif(10)
z100 <- runif(100)
z1000 <- runif(1000)

ksaai(z10, z10)
#> [1] 0.7272727
ksaai(z100, z100)
#> [1] 0.970297
ksaai(z1000, z1000)
#> [1] 0.997003

^{Created on 2024-04-14 with reprex v2.1.0}

vincentarelbundock · 2024-04-14T15:50:03Z

Your note about sample size is presumably what he means by "converges to a limit" in point 4 of the screenshot in my original post. Since there's theory to provide confidence intervals, maybe that's not a big deal? Maybe even good?

And on symmetry:

(1) Unlike most coefficients, ξn is not symmetric in X and Y .
But that is intentional. We would like to keep it that way because we may
want to understand if Y is a function X, and not just if one of the variables
is a function of the other. If we want to understand whether X is a function
of Y , we should use ξn(Y, X) instead of ξn(X, Y ). A symmetric measure
of dependence, if required, can be easily obtained by taking the maximum
of ξn(X, Y ) and ξn(Y, X).

mattansb · 2024-04-15T07:27:00Z

Cool (👍

I don't see any mention of a confidence interval - should we just use Fisher's Z?
In theory, xi is non-negative, but it sometimes is - should we return 0 in such cases?

vincentarelbundock · 2024-04-15T10:57:40Z

I don’t see any mention of a confidence interval

Sorry, I misread about the CI. The XICOR package does provide a SD, but it feels wrong to just compute a symmetric interval using that.

should we just use Fisher’s Z?

I’ve only really skimmed the paper, and don’t truly understand it. Until I grok this better (realistically: never), I would be reticent to report a quantity not explicitly endorsed by the author.

In theory, xi is non-negative, but it sometimes is - should we return 0 in such cases?

“In the limit” != “In theory”. I’d say report the actual output of the equation, rather than an ad hoc hack.

I ran into some errors with your ksaai() function with large N. However, the paper authors have published a XICOR package on CRAN. It seems fast and is published under Apache License which, I believe, is compatible with GPL3.

library(XICOR)
N <- 100
x <- rnorm(N, sd = 4)
y <- sin(x) + rnorm(N, sd = 0.2)
xicor(y, x, pvalue = TRUE)

    $xi
    [1] 0.03840384

    $sd
    [1] 0.06325978

    $pval
    [1] 0.2718984

mattansb · 2024-04-15T12:02:11Z

In theory == I mean the estimand is non-positive.

I'll run some simulations to see if the Fisher Z CIs work well enough.

bwiernik · 2024-04-15T12:39:09Z

The author did a small simulation in section 4.2 and concluded that sqrt(n) * xi is asymptomatically normal (when n = 1000). That's not unexpected, but also not very helpful for more realistic sample sizes.

The author's XICOR package defaults to using the specified mean and SD values with a normal distribution. They also offer a permutation test.

I'd be okay with reporting normal-theory intervals and p values to start given that's what the author does, but we should ideally do some simulations to confirm good performance of the intervals at smaller n (or use a z transform if that works nicely).

I don't compare the code above from the blog post and the XICOR package to be sure they aligned, but we should follow XICOR https://github.com/cran/XICOR/blob/master/R/xicor.R

TarandeepKang · 2024-04-16T10:30:22Z

Hi All,

Just to mention that this preprint has now been published:

Chatterjee, S. (2021). A New Coefficient of Correlation. Journal of the American Statistical Association, 116(536), 2009–2022. https://doi.org/10.1080/01621459.2020.1758115

IndrajeetPatil transferred this issue from easystats/easystats Apr 11, 2024

IndrajeetPatil added the feature idea 🔥 New feature or request label Apr 11, 2024

mattansb mentioned this issue Apr 30, 2024

Support the new "Chatterjee" (?) correlation? #251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new correlation coefficient (Chatterjee) #314

A new correlation coefficient (Chatterjee) #314

vincentarelbundock commented Apr 11, 2024

bwiernik commented Apr 12, 2024

mattansb commented Apr 14, 2024

vincentarelbundock commented Apr 14, 2024

mattansb commented Apr 15, 2024

vincentarelbundock commented Apr 15, 2024

mattansb commented Apr 15, 2024

bwiernik commented Apr 15, 2024

TarandeepKang commented Apr 16, 2024

A new correlation coefficient (Chatterjee) #314

A new correlation coefficient (Chatterjee) #314

Comments

vincentarelbundock commented Apr 11, 2024

bwiernik commented Apr 12, 2024

mattansb commented Apr 14, 2024

vincentarelbundock commented Apr 14, 2024

mattansb commented Apr 15, 2024

vincentarelbundock commented Apr 15, 2024

mattansb commented Apr 15, 2024

bwiernik commented Apr 15, 2024

TarandeepKang commented Apr 16, 2024