Model-averaged estimates/intervals/distributions #768

bwiernik · 2022-09-03T12:22:09Z

There's now a convenient package for computing frequentist model-averaged CIs/CDs https://cran.r-project.org/web/packages/MATA/index.html

As well as for Bayesian model-averaging/stacking https://mc-stan.org/loo/reference/loo_model_weights.html

We should have a function to compute average estimates/intervals/densities. It should permit choice of weighting method, interval method, and be passable to {see} for plotting.

For frequentist averaging, that's usually done based on information criteria, so this would entail a soft dependency of {parameters} on {performance}.

vincentarelbundock · 2022-09-03T12:29:19Z

Do you see us being able to "extract" info from objects produced by these two packages (probably easy), or do you imagine us writing "wrappers" around those two packages (more maintenance)?

bwiernik · 2022-09-03T13:46:09Z

For both, they take lists of models and return results back, so wrapping isn't likely to be much effort. We already wrap around loo for a much of functions in performance, so we can probably create some unified code for easystats generally

mattansb · 2022-09-03T19:02:21Z

For Bayes, I wrote this function that does this exactly a while back (years???)

https://easystats.github.io/bayestestR/reference/weighted_posteriors.html

Currently used BFs for weights (personally, that's what makes the most sense to me as we are mixing the posteriors from the same data, but different priors), but should be expandable (:

bwiernik · 2022-09-03T21:48:25Z

There's a Vehtari, Gelman, et al paper discussing that the "stacking" method used in {loo} by default apparently has some important advantages over traditional Bayesian model averaging

strengejacke · 2022-09-05T14:16:19Z

what would be the model-average estimate? Here's an example for the intervals, I think it's easy to re-implement mata().

library(insight)
library(MATA)

set.seed(0)
n <- 20 # 'n' is assumed to be even
x1 <- c(rep(0, n / 2), rep(1, n / 2)) # two groups: x1=0, and x1=1
x2 <- rnorm(n, mean = 10, sd = 3)
y <- rnorm(n, mean = 3 * x1 + 0.1 * x2) # data generation

x1 <- factor(x1)
m1 <- glm(y ~ x1) # using 'glm' provides AIC values.
m2 <- glm(y ~ x1 + x2) # using 'lm' doesn't.
aic <- c(m1$aic, m2$aic)
delta.aic <- aic - min(aic)
model.weights <- exp(-0.5 * delta.aic) / sum(exp(-0.5 * delta.aic))

# see also
# performance::compare_performance(m1, m2)

residual.dfs <- c(insight::get_df(m1), insight::get_df(m2))


g1 <- insight::get_datagrid(m1)
g2 <- insight::get_datagrid(m2)

nd1 <- as.data.frame(lapply(g1, function(i) {
  if (is.factor(i)) {
    as.factor(levels(i)[1])
  } else {
    unique(i)[1]
  }
}))


nd2 <- as.data.frame(lapply(g2, function(i) {
  if (is.factor(i)) {
    as.factor(levels(i)[1])
  } else {
    unique(i)[1]
  }
}))

p1 <- get_predicted(m1, data = nd1, ci = .95)
p2 <- get_predicted(m2, data = nd2, , ci = .95)

theta.hats <- c(p1, p2)
se.theta.hats <- c(attributes(p1)$ci_data$SE, attributes(p2)$ci_data$SE)

#  95% MATA-Wald confidence interval for theta:
mata.wald(theta.hats, se.theta.hats, model.weights,
  mata.t = TRUE,
  residual.dfs = residual.dfs
)
#> [1] -0.3756852  1.5499646

^{Created on 2022-09-05 with reprex v2.0.2}

strengejacke · 2022-09-05T14:31:38Z

I think it should be theta.hats, right?

DrJerryTAO · 2023-03-12T23:40:33Z

I have used {glmulti} for multimodel inference. I believe it implements frequentist model average by default (and by design???). It explores through millions of possible model formula specifications and retains the top ones of an arbitrary size, optionally using a genetic algorithm, which I have not seen provided in other packages. It also provides model-averaged model summary statistics including coef, SEs, CIs. MuMIn Package was mentioned along with {glmulti} in https://www.metafor-project.org/doku.php/tips:model_selection_with_glmulti_and_mumin and https://www.r-bloggers.com/2013/02/model-selection-and-multi-model-inference, but I have not tried it.

Do you see us being able to "extract" info from objects produced by these two packages (probably easy), or do you imagine us writing "wrappers" around those two packages (more maintenance)?

@vincentarelbundock Would be nice to have model-averaged marginal effects. It appears very few have done so, except this one https://www.ajordannafa.com/blog/2022/being-less-wrong where {marginaleffects} was useful. However, I am not sure how to implement model-averaged marginal effects if using frequentist and not Bayesian models. I assume it is to extract coefficient estimates and vcov() from model-averaged summary statistics, and supply it just like a single model to {marginaleffects}. Otherwise, many single models need to be supplied, and it sounds lots of predictions to do for {marginaleffects}.

strengejacke added the Feature idea 🔥 New feature or request label Sep 5, 2022

strengejacke mentioned this issue Sep 5, 2022

Model-averaged estimates/intervals/distributions #771

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model-averaged estimates/intervals/distributions #768

Model-averaged estimates/intervals/distributions #768

bwiernik commented Sep 3, 2022

vincentarelbundock commented Sep 3, 2022

bwiernik commented Sep 3, 2022

mattansb commented Sep 3, 2022

bwiernik commented Sep 3, 2022

strengejacke commented Sep 5, 2022 •

edited

Loading

strengejacke commented Sep 5, 2022

DrJerryTAO commented Mar 12, 2023

Model-averaged estimates/intervals/distributions #768

Model-averaged estimates/intervals/distributions #768

Comments

bwiernik commented Sep 3, 2022

vincentarelbundock commented Sep 3, 2022

bwiernik commented Sep 3, 2022

mattansb commented Sep 3, 2022

bwiernik commented Sep 3, 2022

strengejacke commented Sep 5, 2022 • edited Loading

strengejacke commented Sep 5, 2022

DrJerryTAO commented Mar 12, 2023

strengejacke commented Sep 5, 2022 •

edited

Loading