Skip to content

augment.glm fails with binomial(link=logit) fitted to data using weights #1250

@HywelMJ

Description

@HywelMJ

The problem

augment.glm seems to fail when the data input to the glm is in summary table form and the call uses weights. Specifying adding data or newdata arguments when calling augment doesn't seem to make any difference.

Reproducible example

library(broom)
set.seed(47)

# Load the dataset
data(mtcars)

# Create a logical response variable: 1 if mpg > 20, 0 otherwise
mtcars$high_mpg <- as.logical(ifelse(mtcars$mpg > 20, 1, 0))

# Categorise by number of cylinders, hp and disp
mtcars$over4cyl <- factor(ifelse(mtcars$cyl > 6, "bigcyl", "smallcyl"),
                          levels = c("smallcyl", "bigcyl"))

mtcars$hpgroup <- factor(ifelse(
  mtcars$hp > 110,
  ifelse(mtcars$hp > 175, "highhp", "mediumhp"),
  "lowhp"
),
levels = c("lowhp", "mediumhp", "highhp"))

mtcars$displgroup <- factor(
  ifelse(
    mtcars$disp > 150,
    ifelse(mtcars$disp > 290, "highdisp", "mediumdisp"),
    "lowdisp"
  ),
  levels = c("lowdisp", "mediumdisp", "highdisp")
)

tbl <- xtabs( ~ high_mpg + over4cyl + hpgroup + displgroup, data = mtcars) |> as.data.frame()

# Fit a GLM with binomial (logit) family:
#    to the individual cars data
mod1 <- glm(
  high_mpg ~ over4cyl + hpgroup + displgroup,
  data = mtcars,
  family = binomial(link = "logit")
)

#    to the table
mod2 <- glm(
  high_mpg ~ over4cyl + hpgroup + displgroup,
  data = tbl,
  family = binomial(link = "logit"),
  weights = Freq
)

# No problem with mod1
try(class(augment(mod1)))
#> [1] "tbl_df"     "tbl"        "data.frame"
    
# Error with mod2    
try(class(augment(mod2)))
#> Error in tryCatchOne(expr, names, parentenv, handlers[[1L]]) : 
#>   Must specify either `data` or `newdata` argument.

Created on 2025-07-15 with reprex v2.1.1

Session info

sessionInfo()
#> R version 4.5.1 (2025-06-13 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=Welsh_United Kingdom.utf8  LC_CTYPE=Welsh_United Kingdom.utf8   
#> [3] LC_MONETARY=Welsh_United Kingdom.utf8 LC_NUMERIC=C                         
#> [5] LC_TIME=Welsh_United Kingdom.utf8    
#> 
#> time zone: Europe/London
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] broom_1.0.8
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.5       cli_3.6.5         knitr_1.50        rlang_1.1.6      
#>  [5] xfun_0.52         purrr_1.0.4       generics_0.1.4    glue_1.8.0       
#>  [9] backports_1.5.0   htmltools_0.5.8.1 rmarkdown_2.29    evaluate_1.0.3   
#> [13] tibble_3.2.1      fastmap_1.2.0     yaml_2.3.10       lifecycle_1.0.4  
#> [17] compiler_4.5.1    dplyr_1.1.4       fs_1.6.6          pkgconfig_2.0.3  
#> [21] tidyr_1.3.1       rstudioapi_0.17.1 digest_0.6.37     R6_2.6.1         
#> [25] reprex_2.1.1      tidyselect_1.2.1  pillar_1.10.2     magrittr_2.0.3   
#> [29] tools_4.5.1       withr_3.0.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions