Inference for the intercept of a linear model

Hello wonderful `infer` authors! I am reaching out about something that has come up as I've been teaching simulation-based inference and comparing it to the CLT-based approach. 

Currently, as described in #509, the permutation test does not allow for us to test $H_0: \beta_0 = 0$, $H_A: \beta_0 \neq 0$ since it does not enforce the null distribution for the intercept to be centered at 0. This results in some funny things, e.g., the results of `get_p_value()` and `shade_p_value()` in this context do not entirely make sense: 
```
library(infer)

obs_fit <- gss %>%
  specify(hours ~ age + college) %>%
  fit()

null_dist <- gss %>%
  specify(hours ~ age + college) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000, type = "permute") %>%
  fit()

visualize(null_dist) +
  shade_p_value(obs_stat = obs_fit, direction = "two-sided")

null_dist %>%
  get_p_value(obs_stat = obs_fit, direction = "two-sided")

# A tibble: 3 × 2
  term          p_value
  <chr>           <dbl>
1 age             0.898
2 collegedegree   0.27 
3 intercept       0.68 
```

Comparing this to the result of `summary.lm`,

```
summary(lm(hours ~ age + college, gss))

Call:
lm(formula = hours ~ age + college, data = gss)

Residuals:
   Min     1Q Median     3Q    Max 
-37.78  -4.93  -0.89   7.26  48.12 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   40.60859    2.15468   18.85   <2e-16 ***
age            0.00596    0.04988    0.12     0.90    
collegedegree  1.53283    1.39325    1.10     0.27    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15 on 497 degrees of freedom
Multiple R-squared:  0.00248,	Adjusted R-squared:  -0.00154 
F-statistic: 0.617 on 2 and 497 DF,  p-value: 0.54
```

yields sensible results for the p-values for the slopes (testing $H_0: \beta_1 = 0$ and $H_0: \beta_2 = 0$) but not for the intercept. 

Is this the preferred behavior? I understand that the permutation test does not help us test the "standard" hypothesis test for the intercept ($H_0: \beta_0 = 0$, $H_A: \beta_0 \neq 0$), but because that is the case should results for the intercept be reported at all? (I mean, who even cares to carry out this test? I couldn't come up with an example where we would actually like to test if the intercept is 0)

If it would be preferred to be able to carry out this test (which I can see the argument for especially if `infer` wants to be able to do the tests that the CLT-based methods do be default), I wonder if an implementation similar/analogous to bootstrapping for a single mean with `stat = "mean"` would be helpful? For example, writing code that is analogous to:
```
gss %>%
  specify(response = hours) %>%
  hypothesize(null = "point", mu = 0) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean")
```
for regression would be 
```
gss %>%
  specify(hours ~ age + college) %>%
  hypothesize(null = "point", beta0 = 0) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  fit()
```

Another possible solution would be to shift the null distribution for the intercept to be centered at 0 after the permutation occurs (or to have an option for that, maybe it wouldn't be the default behavior?), e.g., with `null_dist` from above: 
```
null_dist %>%
  ungroup() %>%
  filter(term == "intercept") %>%
  mutate(estimate = estimate - mean(estimate)) 
```

Surely, neither of these are the most elegant solution -- but I image folks working on this package would have some excellent ideas for elegant integration of something similar. 

This issue was a fun one to present to my students, and I actually think it got them thinking about what is happening in a permutation test more deeply. But I worry that by opting to keep results for the intercept here may typically lead folks astray. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference for the intercept of a linear model #562

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference for the intercept of a linear model #562

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions