Skip to content

partial r hypothesis testing does not account for uncertainty in residualizing #63

Open
@mattansb

Description

@mattansb

Note the following examples:

res <- correlation::correlation(mtcars, partial = TRUE)
res[res$Parameter1=="mpg",]
#> Parameter1 | Parameter2 |     r |     t | df |     p |         95% CI |  Method | n_Obs
#> ---------------------------------------------------------------------------------------
#> mpg        |        cyl | -0.02 | -0.13 | 30 | 1.000 | [-0.37,  0.33] | Pearson |    32
#> mpg        |       disp |  0.16 |  0.89 | 30 | 1.000 | [-0.20,  0.48] | Pearson |    32
#> mpg        |         hp | -0.21 | -1.18 | 30 | 1.000 | [-0.52,  0.15] | Pearson |    32
#> mpg        |       drat |  0.10 |  0.58 | 30 | 1.000 | [-0.25,  0.44] | Pearson |    32
#> mpg        |         wt | -0.39 | -2.34 | 30 | 1.000 | [-0.65, -0.05] | Pearson |    32
#> mpg        |       qsec |  0.24 |  1.34 | 30 | 1.000 | [-0.12,  0.54] | Pearson |    32
#> mpg        |         vs |  0.03 |  0.18 | 30 | 1.000 | [-0.32,  0.38] | Pearson |    32
#> mpg        |         am |  0.26 |  1.46 | 30 | 1.000 | [-0.10,  0.56] | Pearson |    32
#> mpg        |       gear |  0.10 |  0.52 | 30 | 1.000 | [-0.26,  0.43] | Pearson |    32
#> mpg        |       carb | -0.05 | -0.29 | 30 | 1.000 | [-0.39,  0.30] | Pearson |    32

res <- ppcor::pcor(mtcars)
data.frame(r = res$estimate[-1,1],
           t = res$statistic[-1,1],
           p = res$p.value[-1,1])
#>                r          t          p
#> cyl  -0.02326429 -0.1066392 0.91608738
#> disp  0.16083460  0.7467585 0.46348865
#> hp   -0.21052027 -0.9868407 0.33495531
#> drat  0.10445452  0.4813036 0.63527790
#> wt   -0.39344938 -1.9611887 0.06325215
#> qsec  0.23809863  1.1234133 0.27394127
#> vs    0.03293117  0.1509915 0.88142347
#> am    0.25832849  1.2254035 0.23398971
#> gear  0.09534261  0.4389142 0.66520643
#> carb -0.05243662 -0.2406258 0.81217871

Created on 2020-04-06 by the reprex package (v0.3.0)

The resulting partial correlations are identical, but the t values are not (and by extension so are the CIs, and the unadjusted p values). Why?

Because correlation() computes partial correlations by residualizing variables, and then computing the correlations between them. But the df of the residualizing process - that is, the degree of uncertainty in estimating the residuals - is not accounted for. (Note that this should be true for Bayesian partial correlations as well - the priors and likelihood of the residualizing process are not accounted for).

Solutions:

  • Account for these. [HARD]
  • Update the docs to explicitly mention this - that inference and CIs are conditional on, and do not account for the uncertainty in estimating the residuals. [EASY]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 🐛Something isn't workingdocs 📚Something to be adressed in docs and/or vignettes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions