Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_datagrid() for {emmeans} and {marginaleffects} #921

Closed
mattansb opened this issue Sep 1, 2024 · 7 comments · Fixed by #925
Closed

get_datagrid() for {emmeans} and {marginaleffects} #921

mattansb opened this issue Sep 1, 2024 · 7 comments · Fixed by #925

Comments

@mattansb
Copy link
Member

mattansb commented Sep 1, 2024

Following easystats/bayestestR#661, this could be useful for making the outputs more functional.

@strengejacke wdyt?

get_datagrid.emm_list <- get_datagrid.emmGrid <- function(x, ...) {
  suppressWarnings(s <- as.data.frame(x))
  data.frame(s)[,1:(which(colnames(s) == attr(s, "estName"))-1), drop = FALSE] 
}

myfit <- lm(mpg ~ factor(gear) + cyl, data = mtcars)

myemms <- emmeans::emmeans(myfit, pairwise ~ gear | cyl)
insight::get_datagrid(myemms)
#>      cyl gear      contrast
#> 1 6.1875    3             .
#> 2 6.1875    4             .
#> 3 6.1875    5             .
#> 4 6.1875    . gear3 - gear4
#> 5 6.1875    . gear3 - gear5
#> 6 6.1875    . gear4 - gear5

@vincentarelbundock is there perhaps a better way to extract this information?

get_datagrid.predictions <- get_datagrid.comparisons <- get_datagrid.slopes <- function(x, ...) {
  data.frame(x)[,1:(which(colnames(x) == "estimate")-1), drop = FALSE] 
}

myme <- marginaleffects::avg_slopes(myfit, variables = "cyl", by = "gear")
insight::get_datagrid(myme)
#>   term    contrast gear
#> 1  cyl mean(dY/dX)    3
#> 2  cyl mean(dY/dX)    4
#> 3  cyl mean(dY/dX)    5
myme <- marginaleffects::avg_predictions(myfit, variables = "cyl", by = "gear")
insight::get_datagrid(myme)
#>   gear
#> 1    4
#> 2    3
#> 3    5
myme <- marginaleffects::avg_comparisons(myfit, variables = "cyl", by = "gear")
insight::get_datagrid(myme)
#>   term contrast gear
#> 1  cyl mean(+1)    3
#> 2  cyl mean(+1)    4
#> 3  cyl mean(+1)    5

Created on 2024-09-01 with reprex v2.1.0

@vincentarelbundock
Copy link
Contributor

Sorry, but I don't understand the use case or the desired feature.

@mattansb
Copy link
Member Author

mattansb commented Sep 1, 2024

Here is an example:

library(marginaleffects)
library(rstanarm)
library(insight)
library(bayestestR)

get_datagrid.predictions <- get_datagrid.comparisons <- get_datagrid.slopes <- function(x, ...) {
  data.frame(x)[,1:(which(colnames(x) == "estimate")-1), drop = FALSE] 
}

fit <- stan_glm(mpg ~ factor(gear) + factor(cyl), 
                data = mtcars, 
                refresh = 0)



(my_avg_diffs <- avg_comparisons(fit, variables = "gear", by = "cyl"))
#> 
#>  Term          Contrast cyl Estimate 2.5 % 97.5 %
#>  gear mean(4) - mean(3)   6     1.44 -2.54   5.30
#>  gear mean(4) - mean(3)   4     1.44 -2.54   5.30
#>  gear mean(4) - mean(3)   8     1.44 -2.54   5.30
#>  gear mean(5) - mean(3)   6     1.57 -2.28   5.48
#>  gear mean(5) - mean(3)   4     1.57 -2.28   5.48
#>  gear mean(5) - mean(3)   8     1.57 -2.28   5.48
#> 
#> Columns: term, contrast, cyl, estimate, conf.low, conf.high, predicted_lo, predicted_hi, predicted, tmp_idx 
#> Type:  response
#> 



# Compute some indicies, keep grid info:
cbind(
  get_datagrid(my_avg_diffs),
  
  p_direction(posterior_draws(my_avg_diffs, shape = "rvar")[["rvar"]])
)
#>   term          contrast cyl Parameter     pd
#> 1 gear mean(4) - mean(3)   6      x[1] 0.7595
#> 2 gear mean(4) - mean(3)   4      x[2] 0.7595
#> 3 gear mean(4) - mean(3)   8      x[4] 0.7595
#> 4 gear mean(5) - mean(3)   6      x[9] 0.7900
#> 5 gear mean(5) - mean(3)   4     x[10] 0.7900
#> 6 gear mean(5) - mean(3)   8     x[12] 0.7900

(So essentially, the info from get_datagrid() will replace that Parameter column.)

@vincentarelbundock
Copy link
Contributor

IIUC, I think this is not easy because the relevant columns to a grid vary in names, and are not always positioned at the start of the data frame.

In this example, I imagine that you'd want all of the contrast_hp, contrast_wt, and qsec, mpg, and wt columns as part of your grid:

library(marginaleffects)

mod <- lm(mpg ~ wt + hp + qsec, data = mtcars)

comparisons(mod,
  variables = c("wt", "hp"),
  cross = TRUE,
  newdata = datagrid(qsec = range)) |>
  data.frame()
#   rowid  term contrast_hp contrast_wt  estimate std.error statistic
# 1     1 cross          +1          +1 -4.376619 0.7412317 -5.904522
# 2     2 cross          +1          +1 -4.376619 0.7412320 -5.904520
#        p.value  s.value  conf.low conf.high qsec predicted_lo predicted_hi
# 1 3.536697e-09 28.07495 -5.829407 -2.923832 14.5     18.37997     14.00335
# 2 3.536744e-09 28.07493 -5.829408 -2.923831 22.9     22.67097     18.29435
#   predicted       hp mpg      wt
# 1  18.37997 146.6875  21 3.21725
# 2  22.67097 146.6875  21 3.21725

@mattansb
Copy link
Member Author

mattansb commented Sep 1, 2024

Ah.
Okay, how about this - I know "group" is a forbidden name - is there a list of such names? If so, omitting all of those columns should just leave the "grid" info, right?

@mattansb
Copy link
Member Author

mattansb commented Sep 1, 2024

Oh, no - because some names aren't protected - like s.value or predicted_lo...

@mattansb
Copy link
Member Author

mattansb commented Sep 1, 2024

wdyt of this?

A combination of column names from the underlying data frame + columns that start with "contrast"

library(marginaleffects)

mod <- lm(mpg ~ wt + hp + qsec, data = mtcars)

myme <- comparisons(mod,
                    variables = c("wt", "hp"),
                    cross = TRUE,
                    newdata = datagrid(qsec = range))


get_datagrid.predictions <- get_datagrid.comparisons <- get_datagrid.slopes <- function(x, ...) {
  cols_newdata <- intersect(colnames(x), colnames(attr(x, "newdata")))
  cols_contrast <- colnames(x)[grep("^contrast_?", colnames(x))]
  cols_grid <- union(cols_newdata, cols_contrast)
  
  data.frame(x)[, cols_grid, drop = FALSE] 
}

insight::get_datagrid(myme)
#>   rowid qsec       hp mpg      wt contrast_hp contrast_wt
#> 1     1 14.5 146.6875  21 3.21725          +1          +1
#> 2     2 22.9 146.6875  21 3.21725          +1          +1

@vincentarelbundock
Copy link
Contributor

Maybe that works. Not sure I'm thinking of all the corner cases, but hypothesis and by might also be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants