-
-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Hi, I noticed an issue when using get_data() with mira objects from mice::with(). I'm not exactly sure what the expected behavior should be (i.e., whether it should produce the original unimputed dataset, the mids objects, a list of data frames, or the stacked data sets). Fortunately when you have one, the others follow.
Using get_data() directly on mira objects fails. Fine, maybe that's not a supported object. But it can fail when used on the regression models fit to each imputed dataset, which exist in the analyses component of the mira object. get_data() falls back on the model.frame method, which works okay for simple fits, but often the model frame does not correspond to the original data, especially in the case of transformations.
I developed an alternative solution that might work, which is to evaluate the models variables themselves within the environment of each model's formula. This was inspired by your approach to evaluate the data component of the model call in that environment, but with mira objects, there is no data argument. I'm hoping something like this can make it into the package.
Below is a reprex, showing that get_data() works okay with a simple model but produces the incorrect dataset when there are transformations in the model. This can be dangerous because those transformations may have the same name as the variables of interest but may be different.
data("lalonde_mis", package = "cobalt")
library(mice)
imp <- mice(lalonde_mis, m = 3, print = F, seed = 1124)
fits <- with(imp, lm(re78 ~ re74 + treat))
# get_data() doesn't work directly on mira
insight::get_data(fits)
#> Warning: Could not recover model data from environment. Please make sure your
#> data is available in your workspace.
#> Trying to retrieve data from the model frame now.
#> Warning: Could not get model data.
#> NULL
# it does work on individual models, using model.frame
data_list <- lapply(fits$analyses, insight::get_data)
#> Warning: Could not recover model data from environment. Please make sure your
#> data is available in your workspace.
#> Trying to retrieve data from the model frame now.
str(data_list)
#> List of 3
#> $ :'data.frame': 614 obs. of 3 variables:
#> ..$ re78 : num [1:614] 9930 3596 24909 7506 290 ...
#> ..$ re74 : num [1:614] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ treat: int [1:614] 1 1 1 1 1 1 1 1 1 1 ...
#> ..- attr(*, "terms")=Classes 'terms', 'formula' language re78 ~ re74 + treat
#> .. .. ..- attr(*, "variables")= language list(re78, re74, treat)
#> .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
#> .. .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. .. ..$ : chr [1:3] "re78" "re74" "treat"
#> .. .. .. .. ..$ : chr [1:2] "re74" "treat"
#> .. .. ..- attr(*, "term.labels")= chr [1:2] "re74" "treat"
#> .. .. ..- attr(*, "order")= int [1:2] 1 1
#> .. .. ..- attr(*, "intercept")= int 1
#> .. .. ..- attr(*, "response")= int 1
#> .. .. ..- attr(*, ".Environment")=<environment: 0x7f81af678e00>
#> .. .. ..- attr(*, "predvars")= language list(re78, re74, treat)
#> .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
#> .. .. .. ..- attr(*, "names")= chr [1:3] "re78" "re74" "treat"
#> ..- attr(*, "is_subset")= logi FALSE
#> $ :'data.frame': 614 obs. of 3 variables:
#> ..$ re78 : num [1:614] 9930 3596 24909 7506 290 ...
#> ..$ re74 : num [1:614] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ treat: int [1:614] 1 1 1 1 1 1 1 1 1 1 ...
#> ..- attr(*, "terms")=Classes 'terms', 'formula' language re78 ~ re74 + treat
#> .. .. ..- attr(*, "variables")= language list(re78, re74, treat)
#> .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
#> .. .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. .. ..$ : chr [1:3] "re78" "re74" "treat"
#> .. .. .. .. ..$ : chr [1:2] "re74" "treat"
#> .. .. ..- attr(*, "term.labels")= chr [1:2] "re74" "treat"
#> .. .. ..- attr(*, "order")= int [1:2] 1 1
#> .. .. ..- attr(*, "intercept")= int 1
#> .. .. ..- attr(*, "response")= int 1
#> .. .. ..- attr(*, ".Environment")=<environment: 0x7f81af71ff20>
#> .. .. ..- attr(*, "predvars")= language list(re78, re74, treat)
#> .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
#> .. .. .. ..- attr(*, "names")= chr [1:3] "re78" "re74" "treat"
#> ..- attr(*, "is_subset")= logi FALSE
#> $ :'data.frame': 614 obs. of 3 variables:
#> ..$ re78 : num [1:614] 9930 3596 24909 7506 290 ...
#> ..$ re74 : num [1:614] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ treat: int [1:614] 1 1 1 1 1 1 1 1 1 1 ...
#> ..- attr(*, "terms")=Classes 'terms', 'formula' language re78 ~ re74 + treat
#> .. .. ..- attr(*, "variables")= language list(re78, re74, treat)
#> .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
#> .. .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. .. ..$ : chr [1:3] "re78" "re74" "treat"
#> .. .. .. .. ..$ : chr [1:2] "re74" "treat"
#> .. .. ..- attr(*, "term.labels")= chr [1:2] "re74" "treat"
#> .. .. ..- attr(*, "order")= int [1:2] 1 1
#> .. .. ..- attr(*, "intercept")= int 1
#> .. .. ..- attr(*, "response")= int 1
#> .. .. ..- attr(*, ".Environment")=<environment: 0x7f81af78d158>
#> .. .. ..- attr(*, "predvars")= language list(re78, re74, treat)
#> .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
#> .. .. .. ..- attr(*, "names")= chr [1:3] "re78" "re74" "treat"
#> ..- attr(*, "is_subset")= logi FALSE
# model with transformations
fits2 <- with(imp, lm(re78 ~ poly(re74, 2) + treat))
# model.frame fails with transformations
data_list2 <- lapply(fits2$analyses, insight::get_data)
#> Warning: Could not recover model data from environment. Please make sure your
#> data is available in your workspace.
#> Trying to retrieve data from the model frame now.
#> Warning: Some model terms could not be found in model data.
#> You probably need to load the data into the environment.
str(data_list2)
#> List of 3
#> $ :'data.frame': 614 obs. of 4 variables:
#> ..$ re78 : num [1:614] 9930 3596 24909 7506 290 ...
#> ..$ treat : int [1:614] 1 1 1 1 1 1 1 1 1 1 ...
#> ..$ re74 : num [1:614] -0.0287 -0.0287 -0.0287 -0.0287 -0.0287 ...
#> ..$ re74.1: num [1:614] 0.0209 0.0209 0.0209 0.0209 0.0209 ...
#> ..- attr(*, "is_subset")= logi FALSE
#> $ :'data.frame': 614 obs. of 4 variables:
#> ..$ re78 : num [1:614] 9930 3596 24909 7506 290 ...
#> ..$ treat : int [1:614] 1 1 1 1 1 1 1 1 1 1 ...
#> ..$ re74 : num [1:614] -0.0285 -0.0285 -0.0285 -0.0285 -0.0285 ...
#> ..$ re74.1: num [1:614] 0.0207 0.0207 0.0207 0.0207 0.0207 ...
#> ..- attr(*, "is_subset")= logi FALSE
#> $ :'data.frame': 614 obs. of 4 variables:
#> ..$ re78 : num [1:614] 9930 3596 24909 7506 290 ...
#> ..$ treat : int [1:614] 1 1 1 1 1 1 1 1 1 1 ...
#> ..$ re74 : num [1:614] -0.0283 -0.0283 -0.0283 -0.0283 -0.0283 ...
#> ..$ re74.1: num [1:614] 0.0211 0.0211 0.0211 0.0211 0.0211 ...
#> ..- attr(*, "is_subset")= logi FALSE
# alternative solution
v <- insight::find_variables(fits, flatten = TRUE)
l <- str2lang(sprintf("list(%s)", toString(v)))
data_list3 <- lapply(fits$analyses, function(fit) {
eval(l, environment(insight::find_formula(fit)$conditional)) |>
list2DF() |>
setNames(v)
})
str(data_list3)
#> List of 3
#> $ :'data.frame': 614 obs. of 3 variables:
#> ..$ re78 : num [1:614] 9930 3596 24909 7506 290 ...
#> ..$ re74 : num [1:614] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ treat: int [1:614] 1 1 1 1 1 1 1 1 1 1 ...
#> $ :'data.frame': 614 obs. of 3 variables:
#> ..$ re78 : num [1:614] 9930 3596 24909 7506 290 ...
#> ..$ re74 : num [1:614] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ treat: int [1:614] 1 1 1 1 1 1 1 1 1 1 ...
#> $ :'data.frame': 614 obs. of 3 variables:
#> ..$ re78 : num [1:614] 9930 3596 24909 7506 290 ...
#> ..$ re74 : num [1:614] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ treat: int [1:614] 1 1 1 1 1 1 1 1 1 1 ...Created on 2025-11-24 with reprex v2.1.1