Skip to content

Unbiased squared distance covariance increases as sample size becomes large #353

@loremarchi

Description

@loremarchi

Dear team, first of all thank you for maintaining this useful package. I was trying to run the independence test by Shen et al. (2022), but I received some weird results, such as a rejection of independence even though I knew that the two one-dimensional vectors I was testing were independent. It was only when working with large samples (more than 70,000 observations) that I noticed something was off.

Although I have not yet identified the source of the problem in the code, I have a script available that reproduces the issue I am referring to. Please note that I have modified the method statistic in the dcorr.py to output both "stat" and "covar".

Reproducing code example:

t = hyppo.independence.Dcorr(); t.is_fast = True
n_samples = [100,1000,10000,50000,70000,100000]
for n in n_samples:
    U1 = np.random.rand(n,1)
    U2 = np.random.rand(n,1)
    S1 = np.sqrt(-2*np.log(U1))*np.cos(2*np.pi*U2)
    S2 = np.sqrt(-2*np.log(U1))*np.sin(2*np.pi*U2)
    print(f'current n is: {n}')
    print(f'Implementation according to R/original paper (unbiased squared distance covariance): {_r_distance_corr(S1, S2, mode = "squared_cov", unbiased = True)}')
    print(f'Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): {t.statistic(S1,S2)[1]}')
    print(f'Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): {t.statistic(S1,S2)[0]}\n')

Please, see that as the sample size increases (n = 70000 and n = 100000) the hyppo unbiased squared distance covariance becomes unintuitive.

Results

current n is: 100
Implementation according to R/original paper (unbiased squared distance covariance): 0.017224788665771484
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): 0.017225187792305974
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): 0.052031725347361675

current n is: 1000
Implementation according to R/original paper (unbiased squared distance covariance): 3.993511199951172e-05
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): 4.020496229051318e-05
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): 9.702530109483742e-05

current n is: 10000
Implementation according to R/original paper (unbiased squared distance covariance): 8.52346420288086e-05
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): 8.585070210065382e-05
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): 0.000211508104872705

current n is: 50000
Implementation according to R/original paper (unbiased squared distance covariance): -7.987022399902344e-06
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): -8.067143002499222e-06
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): -1.9942861479842804e-05

current n is: 70000
Implementation according to R/original paper (unbiased squared distance covariance): -6.318092346191406e-06
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): 4.23422066658965
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): 0.9132199516986869

current n is: 100000
Implementation according to R/original paper (unbiased squared distance covariance): 1.430511474609375e-06
Implementation according to hyppo (unbiased squared distance covariance, covar in dcorr.py): 15.134514035440011
Implementation according to hyppo (unbiased distance correlation, stat in dcorr.py): 0.9742237257727073

Version information

  • OS: Any
  • Python Version: Any
  • Package Version 0.3.2 and 0.4.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions