-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A new correlation coefficient (Chatterjee) #314
Comments
Neat! Looks straightforward! |
From here ksaai <- function(X, Y, ties = TRUE){
n <- length(X)
r <- rank(Y[order(X)], ties.method = "random")
set.seed(42)
if(ties){
l <- rank(Y[order(X)], ties.method = "max")
return( 1 - n*sum( abs(r[-1] - r[-n]) ) / (2*sum(l*(n - l))) )
} else {
return( 1 - 3 * sum( abs(r[-1] - r[-n]) ) / (n^2 - 1) )
}
} I don't like that it's not symmetrical - shouldn't correlation coefficients be symmetrical? x <- rnorm(100, sd = 4)
y <- sin(x) + rnorm(100, sd = 0.2)
plot(x, y) ksaai(x, y)
#> [1] 0.6306631
ksaai(y, x)
#> [1] -0.1710171 Also the maximal value isn't 1 and seems to depend on the sample size? z10 <- runif(10)
z100 <- runif(100)
z1000 <- runif(1000)
ksaai(z10, z10)
#> [1] 0.7272727
ksaai(z100, z100)
#> [1] 0.970297
ksaai(z1000, z1000)
#> [1] 0.997003 Created on 2024-04-14 with reprex v2.1.0 |
Your note about sample size is presumably what he means by "converges to a limit" in point 4 of the screenshot in my original post. Since there's theory to provide confidence intervals, maybe that's not a big deal? Maybe even good? And on symmetry:
|
Cool (👍
|
Sorry, I misread about the CI. The
I’ve only really skimmed the paper, and don’t truly understand it. Until I grok this better (realistically: never), I would be reticent to report a quantity not explicitly endorsed by the author.
“In the limit” != “In theory”. I’d say report the actual output of the equation, rather than an ad hoc hack. I ran into some errors with your library(XICOR)
N <- 100
x <- rnorm(N, sd = 4)
y <- sin(x) + rnorm(N, sd = 0.2)
xicor(y, x, pvalue = TRUE)
$xi
[1] 0.03840384
$sd
[1] 0.06325978
$pval
[1] 0.2718984 |
In theory == I mean the estimand is non-positive. I'll run some simulations to see if the Fisher Z CIs work well enough. |
The author did a small simulation in section 4.2 and concluded that The author's XICOR package defaults to using the specified mean and SD values with a normal distribution. They also offer a permutation test. I'd be okay with reporting normal-theory intervals and p values to start given that's what the author does, but we should ideally do some simulations to confirm good performance of the intervals at smaller n (or use a z transform if that works nicely). I don't compare the code above from the blog post and the XICOR package to be sure they aligned, but we should follow XICOR https://github.com/cran/XICOR/blob/master/R/xicor.R |
Hi All, Just to mention that this preprint has now been published: Chatterjee, S. (2021). A New Coefficient of Correlation. Journal of the American Statistical Association, 116(536), 2009–2022. https://doi.org/10.1080/01621459.2020.1758115 |
Have not read yet, but this looks fun: https://arxiv.org/pdf/1909.10140.pdf
The text was updated successfully, but these errors were encountered: