Julian Faraway 2024-10-14
See the introduction for an overview.
See a mostly Bayesian analysis analysis of the same data.
This example is discussed in more detail in the book Bayesian Regression Modeling with INLA
Packages used:
library(ggplot2)
In Davison and Hinkley,
1997, the results of a study
on Nitrofen, a herbicide, are reported. Due to concern regarding the
effect on animal life, 50 female water fleas were divided into five
groups of ten each and treated with different concentrations of the
herbicide. The number of offspring in three subsequent broods for each
flea was recorded. We start by loading the data from the boot
package:
(the boot
package comes with base R distribution so there is no need
to download this)
data(nitrofen, package="boot")
head(nitrofen)
conc brood1 brood2 brood3 total
1 0 3 14 10 27
2 0 5 12 15 32
3 0 6 11 17 34
4 0 6 12 15 33
5 0 6 15 15 36
6 0 5 14 15 34
We need to rearrange the data to have one response value per line:
lnitrofen = data.frame(conc = rep(nitrofen$conc,each=3),
live = as.numeric(t(as.matrix(nitrofen[,2:4]))),
id = rep(1:50,each=3),
brood = rep(1:3,50))
head(lnitrofen)
conc live id brood
1 0 3 1 1
2 0 14 1 2
3 0 10 1 3
4 0 5 2 1
5 0 12 2 2
6 0 15 2 3
Make a plot of the data:
lnitrofen$jconc <- lnitrofen$conc + rep(c(-10,0,10),50)
lnitrofen$fbrood = factor(lnitrofen$brood)
ggplot(lnitrofen, aes(x=jconc,y=live, shape=fbrood, color=fbrood)) +
geom_point(position = position_jitter(w = 0, h = 0.5)) +
xlab("Concentration") + labs(shape = "Brood")
Since the response is a small count, a Poisson model is a natural choice. We expect the rate of the response to vary with the brood and concentration level. The plot of the data suggests these two predictors may have an interaction. The three observations for a single flea are likely to be correlated. We might expect a given flea to tend to produce more, or less, offspring over a lifetime. We can model this with an additive random effect. The linear predictor is:
where
We fit a model using penalized quasi-likelihood (PQL) using the lme4
package:
library(lme4)
lnitrofen$brood = factor(lnitrofen$brood)
glmod <- glmer(live ~ I(conc/300)*brood + (1|id), nAGQ=25,
family=poisson, data=lnitrofen)
summary(glmod, correlation = FALSE)
Generalized linear mixed model fit by maximum likelihood (Adaptive Gauss-Hermite Quadrature, nAGQ = 25) ['glmerMod']
Family: poisson ( log )
Formula: live ~ I(conc/300) * brood + (1 | id)
Data: lnitrofen
AIC BIC logLik deviance df.resid
313.9 335.0 -150.0 299.9 143
Scaled residuals:
Min 1Q Median 3Q Max
-2.208 -0.606 -0.008 0.618 3.565
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.0911 0.302
Number of obs: 150, groups: id, 50
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.6386 0.1367 11.99 < 2e-16
I(conc/300) -0.0437 0.2193 -0.20 0.84
brood2 1.1687 0.1377 8.48 < 2e-16
brood3 1.3512 0.1351 10.00 < 2e-16
I(conc/300):brood2 -1.6730 0.2487 -6.73 1.7e-11
I(conc/300):brood3 -1.8312 0.2451 -7.47 7.9e-14
We scaled the concentration by dividing by 300 (the maximum value is
310) to avoid scaling problems encountered with glmer()
. This is
helpful in any case since it puts all the parameter estimates on a
similar scale. The first brood is the reference level so the slope for
this group is estimated as
We would need to work harder to test the significance of the effects as
lmerTest
is of no help for GLMMs. We do have the z-tests but in this
example, they do not give us the tests we want.
Cannot fit GLMMs (other than normal response).
Cannot fit GLMMs (other than normal response).
See the discussion for the single random effect example for some introduction.
library(glmmTMB)
We can fit the model with:
gtmod <- glmmTMB(live ~ I(conc/300)*brood + (1|id),
family=poisson, data=lnitrofen)
summary(gtmod)
Family: poisson ( log )
Formula: live ~ I(conc/300) * brood + (1 | id)
Data: lnitrofen
AIC BIC logLik deviance df.resid
810.0 831.1 -398.0 796.0 143
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
id (Intercept) 0.0906 0.301
Number of obs: 150, groups: id, 50
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.6386 0.1366 12.00 < 2e-16
I(conc/300) -0.0435 0.2191 -0.20 0.84
brood2 1.1685 0.1377 8.48 < 2e-16
brood3 1.3510 0.1351 10.00 < 2e-16
I(conc/300):brood2 -1.6724 0.2486 -6.73 1.7e-11
I(conc/300):brood3 -1.8305 0.2450 -7.47 7.9e-14
Almost the same output as with lme4
although a different optimizer has
been used for the fit.
Only lme4
and glmmTMB
can manage this.
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.7
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmmTMB_1.1.9 lme4_1.1-35.5 Matrix_1.7-0 ggplot2_3.5.1
loaded via a namespace (and not attached):
[1] utf8_1.2.4 generics_0.1.3 lattice_0.22-6 digest_0.6.37 magrittr_2.0.3
[6] evaluate_0.24.0 grid_4.4.1 estimability_1.5.1 mvtnorm_1.2-6 fastmap_1.2.0
[11] jsonlite_1.8.8 mgcv_1.9-1 fansi_1.0.6 scales_1.3.0 numDeriv_2016.8-1.1
[16] cli_3.6.3 rlang_1.1.4 munsell_0.5.1 splines_4.4.1 withr_3.0.1
[21] yaml_2.3.10 tools_4.4.1 nloptr_2.1.1 coda_0.19-4.1 minqa_1.2.8
[26] dplyr_1.1.4 colorspace_2.1-1 boot_1.3-31 vctrs_0.6.5 R6_2.5.1
[31] lifecycle_1.0.4 emmeans_1.10.4 MASS_7.3-61 pkgconfig_2.0.3 pillar_1.9.0
[36] gtable_0.3.5 glue_1.7.0 Rcpp_1.0.13 systemfonts_1.1.0 xfun_0.47
[41] tibble_3.2.1 tidyselect_1.2.1 rstudioapi_0.16.0 knitr_1.48 farver_2.1.2
[46] xtable_1.8-4 htmltools_0.5.8.1 nlme_3.1-166 rmarkdown_2.28 svglite_2.1.3
[51] labeling_0.4.3 TMB_1.9.14 compiler_4.4.1