Skip to content

Partial Correlation gives unexpected output for toy example #435

@PascalIversen

Description

@PascalIversen

Hi, thanks for this great library!
I am getting perfect correlation for the following toy example. I expected ~zero correlation as in the regression approach.

import numpy as np
import pandas as pd
import statsmodels.api as sm
import pingouin as pg
from scipy.stats import pearsonr

n = 10000
y = list(range(1, n+1))
x = y + np.random.normal(size=n)*0.1
z = y 
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
print(pg.partial_corr(data=df, x='x', y='y', covar=['z']))


# Regress x on z and u and get residuals
X_with_const = sm.add_constant(np.column_stack([z]))  # Add a constant and include both z and u
model_X = sm.OLS(x, X_with_const).fit()
residuals_X = model_X.resid

# Regress y on z and u and get residuals
model_Y = sm.OLS(y, X_with_const).fit()
residuals_Y = model_Y.resid

#  Compute correlation of residuals
residual_corr, p = pearsonr(residuals_X, residuals_Y)
print(f'Partial correlation using statsmodels: {residual_corr}, {p}')

Output:

             n    r       CI95%  p-val
pearson  10000  1.0  [1.0, 1.0]    0.0
Partial correlation using statsmodels: 0.0012024407422241278, 0.9043016773480718
 pingouin.__version__
'0.5.4'

Metadata

Metadata

Assignees

Labels

bug 💥Something isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions