-
Notifications
You must be signed in to change notification settings - Fork 93
Description
Dear team, first of all thank you for maintaining this useful package. I was trying to run the independence test by Shen et al. (2022), but I received some weird results, such as a rejection of independence even though I knew that the two one-dimensional vectors I was testing were independent. It was only when working with large samples (more than 70,000 observations) that I noticed something was off.
Although I have not yet identified the source of the problem in the code, I have a script available that reproduces the issue I am referring to. Please note that I have modified the method statistic in the dcorr.py to output both "stat" and "covar".
Reproducing code example:
t = hyppo.independence.Dcorr(); t.is_fast = True
n_samples = [100,1000,10000,50000,70000,100000]
for n in n_samples:
U1 = np.random.rand(n,1)
U2 = np.random.rand(n,1)
S1 = np.sqrt(-2*np.log(U1))*np.cos(2*np.pi*U2)
S2 = np.sqrt(-2*np.log(U1))*np.sin(2*np.pi*U2)
print(f'current n is: {n}')
print(f'Implementation according to R/original paper (unbiased squared distance covariance): {_r_distance_corr(S1, S2, mode = "squared_cov", unbiased = True)}')
print(f'Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): {t.statistic(S1,S2)[1]}')
print(f'Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): {t.statistic(S1,S2)[0]}\n')
Please, see that as the sample size increases (n = 70000 and n = 100000) the hyppo unbiased squared distance covariance becomes unintuitive.
Results
current n is: 100
Implementation according to R/original paper (unbiased squared distance covariance): 0.017224788665771484
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): 0.017225187792305974
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): 0.052031725347361675
current n is: 1000
Implementation according to R/original paper (unbiased squared distance covariance): 3.993511199951172e-05
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): 4.020496229051318e-05
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): 9.702530109483742e-05
current n is: 10000
Implementation according to R/original paper (unbiased squared distance covariance): 8.52346420288086e-05
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): 8.585070210065382e-05
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): 0.000211508104872705
current n is: 50000
Implementation according to R/original paper (unbiased squared distance covariance): -7.987022399902344e-06
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): -8.067143002499222e-06
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): -1.9942861479842804e-05
current n is: 70000
Implementation according to R/original paper (unbiased squared distance covariance): -6.318092346191406e-06
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): 4.23422066658965
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): 0.9132199516986869
current n is: 100000
Implementation according to R/original paper (unbiased squared distance covariance): 1.430511474609375e-06
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): 15.134514035440011
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): 0.9742237257727073
Version information
- OS: Any
- Python Version: Any
- Package Version 0.3.2 and 0.4.0