-
Notifications
You must be signed in to change notification settings - Fork 306
Open
Description
The problem
augment.glm seems to fail when the data input to the glm is in summary table form and the call uses weights. Specifying adding data or newdata arguments when calling augment doesn't seem to make any difference.
Reproducible example
library(broom)
set.seed(47)
# Load the dataset
data(mtcars)
# Create a logical response variable: 1 if mpg > 20, 0 otherwise
mtcars$high_mpg <- as.logical(ifelse(mtcars$mpg > 20, 1, 0))
# Categorise by number of cylinders, hp and disp
mtcars$over4cyl <- factor(ifelse(mtcars$cyl > 6, "bigcyl", "smallcyl"),
levels = c("smallcyl", "bigcyl"))
mtcars$hpgroup <- factor(ifelse(
mtcars$hp > 110,
ifelse(mtcars$hp > 175, "highhp", "mediumhp"),
"lowhp"
),
levels = c("lowhp", "mediumhp", "highhp"))
mtcars$displgroup <- factor(
ifelse(
mtcars$disp > 150,
ifelse(mtcars$disp > 290, "highdisp", "mediumdisp"),
"lowdisp"
),
levels = c("lowdisp", "mediumdisp", "highdisp")
)
tbl <- xtabs( ~ high_mpg + over4cyl + hpgroup + displgroup, data = mtcars) |> as.data.frame()
# Fit a GLM with binomial (logit) family:
# to the individual cars data
mod1 <- glm(
high_mpg ~ over4cyl + hpgroup + displgroup,
data = mtcars,
family = binomial(link = "logit")
)
# to the table
mod2 <- glm(
high_mpg ~ over4cyl + hpgroup + displgroup,
data = tbl,
family = binomial(link = "logit"),
weights = Freq
)
# No problem with mod1
try(class(augment(mod1)))
#> [1] "tbl_df" "tbl" "data.frame"
# Error with mod2
try(class(augment(mod2)))
#> Error in tryCatchOne(expr, names, parentenv, handlers[[1L]]) :
#> Must specify either `data` or `newdata` argument.
Created on 2025-07-15 with reprex v2.1.1
Session info
sessionInfo()
#> R version 4.5.1 (2025-06-13 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=Welsh_United Kingdom.utf8 LC_CTYPE=Welsh_United Kingdom.utf8
#> [3] LC_MONETARY=Welsh_United Kingdom.utf8 LC_NUMERIC=C
#> [5] LC_TIME=Welsh_United Kingdom.utf8
#>
#> time zone: Europe/London
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] broom_1.0.8
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 cli_3.6.5 knitr_1.50 rlang_1.1.6
#> [5] xfun_0.52 purrr_1.0.4 generics_0.1.4 glue_1.8.0
#> [9] backports_1.5.0 htmltools_0.5.8.1 rmarkdown_2.29 evaluate_1.0.3
#> [13] tibble_3.2.1 fastmap_1.2.0 yaml_2.3.10 lifecycle_1.0.4
#> [17] compiler_4.5.1 dplyr_1.1.4 fs_1.6.6 pkgconfig_2.0.3
#> [21] tidyr_1.3.1 rstudioapi_0.17.1 digest_0.6.37 R6_2.6.1
#> [25] reprex_2.1.1 tidyselect_1.2.1 pillar_1.10.2 magrittr_2.0.3
#> [29] tools_4.5.1 withr_3.0.2
Metadata
Metadata
Assignees
Labels
No labels