Skip to content

Commit

Permalink
Link together the check_* family (#570)
Browse files Browse the repository at this point in the history
  • Loading branch information
strengejacke authored Apr 4, 2023
1 parent fabd707 commit 6322ec4
Show file tree
Hide file tree
Showing 24 changed files with 306 additions and 130 deletions.
2 changes: 2 additions & 0 deletions R/check_autocorrelation.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
#' @return Invisibly returns the p-value of the test statistics. A p-value < 0.05
#' indicates autocorrelated residuals.
#'
#' @family functions to check model assumptions and and assess model quality
#'
#' @details Performs a Durbin-Watson-Test to check for autocorrelated residuals.
#' In case of autocorrelation, robust standard errors return more accurate
#' results for the estimates, or maybe a mixed model with error term for the
Expand Down
2 changes: 2 additions & 0 deletions R/check_collinearity.R
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@
#' common statistical problems: Data exploration. Methods in Ecology and
#' Evolution (2010) 1:3–14.
#'
#' @family functions to check model assumptions and and assess model quality
#'
#' @note The code to compute the confidence intervals for the VIF and tolerance
#' values was adapted from the Appendix B from the Marcoulides et al. paper.
#' Thus, credits go to these authors the original algorithm. There is also
Expand Down
65 changes: 33 additions & 32 deletions R/check_convergence.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,38 +12,39 @@
#' @return `TRUE` if convergence is fine and `FALSE` if convergence
#' is suspicious. Additionally, the convergence value is returned as attribute.
#'
#' @details \subsection{Convergence and log-likelihood}{
#' Convergence problems typically arise when the model hasn't converged
#' to a solution where the log-likelihood has a true maximum. This may result
#' in unreliable and overly complex (or non-estimable) estimates and standard
#' errors.
#' }
#' \subsection{Inspect model convergence}{
#' **lme4** performs a convergence-check (see `?lme4::convergence`),
#' however, as as discussed [here](https://github.com/lme4/lme4/issues/120)
#' and suggested by one of the lme4-authors in
#' [this comment](https://github.com/lme4/lme4/issues/120#issuecomment-39920269),
#' this check can be too strict. `check_convergence()` thus provides an
#' alternative convergence test for `merMod`-objects.
#' }
#' \subsection{Resolving convergence issues}{
#' Convergence issues are not easy to diagnose. The help page on
#' `?lme4::convergence` provides most of the current advice about
#' how to resolve convergence issues. Another clue might be large parameter
#' values, e.g. estimates (on the scale of the linear predictor) larger than
#' 10 in (non-identity link) generalized linear model *might* indicate
#' [complete separation](https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logisticprobit-regression-and-how-do-we-deal-with-them/).
#' Complete separation can be addressed by regularization, e.g. penalized
#' regression or Bayesian regression with appropriate priors on the fixed effects.
#' }
#' \subsection{Convergence versus Singularity}{
#' Note the different meaning between singularity and convergence: singularity
#' indicates an issue with the "true" best estimate, i.e. whether the maximum
#' likelihood estimation for the variance-covariance matrix of the random effects
#' is positive definite or only semi-definite. Convergence is a question of
#' whether we can assume that the numerical optimization has worked correctly
#' or not.
#' }
#' @section Convergence and log-likelihood:
#' Convergence problems typically arise when the model hasn't converged
#' to a solution where the log-likelihood has a true maximum. This may result
#' in unreliable and overly complex (or non-estimable) estimates and standard
#' errors.
#'
#' @section Inspect model convergence:
#' **lme4** performs a convergence-check (see `?lme4::convergence`),
#' however, as as discussed [here](https://github.com/lme4/lme4/issues/120)
#' and suggested by one of the lme4-authors in
#' [this comment](https://github.com/lme4/lme4/issues/120#issuecomment-39920269),
#' this check can be too strict. `check_convergence()` thus provides an
#' alternative convergence test for `merMod`-objects.
#'
#' @section Resolving convergence issues:
#' Convergence issues are not easy to diagnose. The help page on
#' `?lme4::convergence` provides most of the current advice about
#' how to resolve convergence issues. Another clue might be large parameter
#' values, e.g. estimates (on the scale of the linear predictor) larger than
#' 10 in (non-identity link) generalized linear model *might* indicate
#' [complete separation](https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logisticprobit-regression-and-how-do-we-deal-with-them/).
#' Complete separation can be addressed by regularization, e.g. penalized
#' regression or Bayesian regression with appropriate priors on the fixed effects.
#'
#' @section Convergence versus Singularity:
#' Note the different meaning between singularity and convergence: singularity
#' indicates an issue with the "true" best estimate, i.e. whether the maximum
#' likelihood estimation for the variance-covariance matrix of the random effects
#' is positive definite or only semi-definite. Convergence is a question of
#' whether we can assume that the numerical optimization has worked correctly
#' or not.
#'
#' @family functions to check model assumptions and and assess model quality
#'
#' @examples
#' if (require("lme4")) {
Expand Down
2 changes: 2 additions & 0 deletions R/check_heteroscedasticity.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
#'
#' @references Breusch, T. S., and Pagan, A. R. (1979) A simple test for heteroscedasticity and random coefficient variation. Econometrica 47, 1287-1294.
#'
#' @family functions to check model assumptions and and assess model quality
#'
#' @examples
#' m <<- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)
#' check_heteroscedasticity(m)
Expand Down
2 changes: 2 additions & 0 deletions R/check_homogeneity.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
#'
#' @note There is also a [`plot()`-method](https://easystats.github.io/see/articles/performance.html) implemented in the \href{https://easystats.github.io/see/}{\pkg{see}-package}.
#'
#' @family functions to check model assumptions and and assess model quality
#'
#' @examples
#' model <<- lm(len ~ supp + dose, data = ToothGrowth)
#' check_homogeneity(model)
Expand Down
2 changes: 2 additions & 0 deletions R/check_model.R
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,8 @@
#' look at the `check` argument and see if some of the model checks could be
#' skipped, which also increases performance.
#'
#' @family functions to check model assumptions and and assess model quality
#'
#' @examples
#' \dontrun{
#' m <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)
Expand Down
2 changes: 1 addition & 1 deletion R/check_multimodal.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' For univariate distributions (one-dimensional vectors), this functions
#' performs a Ameijeiras-Alonso et al. (2018) excess mass test. For multivariate
#' distributions (dataframes), it uses mixture modelling. However, it seems that
#' distributions (data frames), it uses mixture modelling. However, it seems that
#' it always returns a significant result (suggesting that the distribution is
#' multimodal). A better method might be needed here.
#'
Expand Down
2 changes: 2 additions & 0 deletions R/check_outliers.R
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
#' function. Note that the function will (silently) return a vector of `FALSE`
#' for non-supported data types such as character strings.
#'
#' @family functions to check model assumptions and and assess model quality
#'
#' @note There is also a
#' [`plot()`-method](https://easystats.github.io/see/articles/performance.html)
#' implemented in the
Expand Down
19 changes: 8 additions & 11 deletions R/check_overdispersion.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,36 +16,33 @@
#' with the mean and, therefore, variance usually (roughly) equals the mean
#' value. If the variance is much higher, the data are "overdispersed".
#'
#' \subsection{Interpretation of the Dispersion Ratio}{
#' @section Interpretation of the Dispersion Ratio:
#' If the dispersion ratio is close to one, a Poisson model fits well to the
#' data. Dispersion ratios larger than one indicate overdispersion, thus a
#' negative binomial model or similar might fit better to the data. A p-value <
#' .05 indicates overdispersion.
#' }
#'
#' \subsection{Overdispersion in Poisson Models}{
#' @section Overdispersion in Poisson Models:
#' For Poisson models, the overdispersion test is based on the code from
#' \cite{Gelman and Hill (2007), page 115}.
#' }
#' _Gelman and Hill (2007), page 115_.
#'
#' \subsection{Overdispersion in Mixed Models}{
#' @section Overdispersion in Mixed Models:
#' For `merMod`- and `glmmTMB`-objects, `check_overdispersion()`
#' is based on the code in the
#' [GLMM FAQ](http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html),
#' section *How can I deal with overdispersion in GLMMs?*. Note that this
#' function only returns an *approximate* estimate of an overdispersion
#' parameter, and is probably inaccurate for zero-inflated mixed models (fitted
#' with `glmmTMB`).
#' }
#'
#' \subsection{How to fix Overdispersion}{
#' @section How to fix Overdispersion:
#' Overdispersion can be fixed by either modeling the dispersion parameter, or
#' by choosing a different distributional family (like Quasi-Poisson, or
#' negative binomial, see \cite{Gelman and Hill (2007), pages 115-116}).
#' }
#' negative binomial, see _Gelman and Hill (2007), pages 115-116_).
#'
#' @references
#' @family functions to check model assumptions and and assess model quality
#'
#' @references
#' - Bolker B et al. (2017):
#' [GLMM FAQ.](http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html)
#'
Expand Down
13 changes: 8 additions & 5 deletions R/pp_check.R → R/check_predictions.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,12 @@
#' similar to the observed outcome than the model in the left panel (a). Thus,
#' model (b) is likely to be preferred over model (a).
#'
#' @note Every model object that has a `simulate()`-method should work with
#' `check_predictions()`. On R 3.6.0 and higher, if **bayesplot** (or a
#' package that imports **bayesplot** such as **rstanarm** or **brms**)
#' is loaded, `pp_check()` is also available as an alias for `check_predictions()`.
#' @note Every model object that has a `simulate()`-method should work with
#' `check_predictions()`. On R 3.6.0 and higher, if **bayesplot** (or a
#' package that imports **bayesplot** such as **rstanarm** or **brms**)
#' is loaded, `pp_check()` is also available as an alias for `check_predictions()`.
#'
#' @family functions to check model assumptions and and assess model quality
#'
#' @references
#' - Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., and Gelman, A. (2019).
Expand All @@ -51,6 +53,7 @@
#'
#' - Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and
#' Rubin, D. B. (2014). Bayesian data analysis. (Third edition). CRC Press.
#'
#' - Gelman, A., Hill, J., and Vehtari, A. (2020). Regression and Other Stories.
#' Cambridge University Press.
#'
Expand Down Expand Up @@ -80,7 +83,7 @@ check_predictions.default <- function(object,
.is_model_valid(object)

if (isTRUE(insight::model_info(object, verbose = FALSE)$is_bayesian) &&
isFALSE(inherits(object, "BFBayesFactor"))) {
isFALSE(inherits(object, "BFBayesFactor"))) {
insight::check_if_installed(
"bayesplot",
"to create posterior prediction plots for Stan models"
Expand Down
66 changes: 33 additions & 33 deletions R/check_singularity.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,39 +12,39 @@
#' @return `TRUE` if the model fit is singular.
#'
#' @details If a model is "singular", this means that some dimensions of the
#' variance-covariance matrix have been estimated as exactly zero. This
#' often occurs for mixed models with complex random effects structures.
#' \cr \cr
#' \dQuote{While singular models are statistically well defined (it is
#' theoretically sensible for the true maximum likelihood estimate to
#' correspond to a singular fit), there are real concerns that (1) singular
#' fits correspond to overfitted models that may have poor power; (2) chances
#' of numerical problems and mis-convergence are higher for singular models
#' (e.g. it may be computationally difficult to compute profile confidence
#' intervals for such models); (3) standard inferential procedures such as
#' Wald statistics and likelihood ratio tests may be inappropriate.}
#' (\cite{lme4 Reference Manual})
#' \cr \cr
#' There is no gold-standard about how to deal with singularity and which
#' random-effects specification to choose. Beside using fully Bayesian methods
#' (with informative priors), proposals in a frequentist framework are:
#' -ize{
#' - avoid fitting overly complex models, such that the
#' variance-covariance matrices can be estimated precisely enough
#' (\cite{Matuschek et al. 2017})
#' - use some form of model selection to choose a model that balances
#' predictive accuracy and overfitting/type I error (\cite{Bates et al. 2015},
#' \cite{Matuschek et al. 2017})
#' - \dQuote{keep it maximal}, i.e. fit the most complex model consistent
#' with the experimental design, removing only terms required to allow a
#' non-singular fit (\cite{Barr et al. 2013})
#' }
#' Note the different meaning between singularity and convergence: singularity
#' indicates an issue with the "true" best estimate, i.e. whether the maximum
#' likelihood estimation for the variance-covariance matrix of the random
#' effects is positive definite or only semi-definite. Convergence is a
#' question of whether we can assume that the numerical optimization has
#' worked correctly or not.
#' variance-covariance matrix have been estimated as exactly zero. This
#' often occurs for mixed models with complex random effects structures.
#'
#' "While singular models are statistically well defined (it is theoretically
#' sensible for the true maximum likelihood estimate to correspond to a singular
#' fit), there are real concerns that (1) singular fits correspond to overfitted
#' models that may have poor power; (2) chances of numerical problems and
#' mis-convergence are higher for singular models (e.g. it may be computationally
#' difficult to compute profile confidence intervals for such models); (3)
#' standard inferential procedures such as Wald statistics and likelihood ratio
#' tests may be inappropriate." (_lme4 Reference Manual_)
#'
#' There is no gold-standard about how to deal with singularity and which
#' random-effects specification to choose. Beside using fully Bayesian methods
#' (with informative priors), proposals in a frequentist framework are:
#'
#' - avoid fitting overly complex models, such that the variance-covariance
#' matrices can be estimated precisely enough (_Matuschek et al. 2017_)
#' - use some form of model selection to choose a model that balances
#' predictive accuracy and overfitting/type I error (_Bates et al. 2015_,
#' _Matuschek et al. 2017_)
#' - "keep it maximal", i.e. fit the most complex model consistent with the
#' experimental design, removing only terms required to allow a non-singular
#' fit (_Barr et al. 2013_)
#'
#' Note the different meaning between singularity and convergence: singularity
#' indicates an issue with the "true" best estimate, i.e. whether the maximum
#' likelihood estimation for the variance-covariance matrix of the random
#' effects is positive definite or only semi-definite. Convergence is a
#' question of whether we can assume that the numerical optimization has
#' worked correctly or not.
#'
#' @family functions to check model assumptions and and assess model quality
#'
#' @references
#' - Bates D, Kliegl R, Vasishth S, Baayen H. Parsimonious Mixed Models.
Expand Down
22 changes: 12 additions & 10 deletions R/check_zeroinflation.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,24 @@
#' @name check_zeroinflation
#'
#' @description `check_zeroinflation()` checks whether count models are
#' over- or underfitting zeros in the outcome.
#' over- or underfitting zeros in the outcome.
#'
#' @param x Fitted model of class `merMod`, `glmmTMB`, `glm`,
#' or `glm.nb` (package \pkg{MASS}).
#' @param x Fitted model of class `merMod`, `glmmTMB`, `glm`, or `glm.nb`
#' (package **MASS**).
#' @param tolerance The tolerance for the ratio of observed and predicted
#' zeros to considered as over- or underfitting zeros. A ratio
#' between 1 +/- `tolerance` is considered as OK, while a ratio
#' beyond or below this threshold would indicate over- or underfitting.
#' zeros to considered as over- or underfitting zeros. A ratio
#' between 1 +/- `tolerance` is considered as OK, while a ratio
#' beyond or below this threshold would indicate over- or underfitting.
#'
#' @return A list with information about the amount of predicted and observed
#' zeros in the outcome, as well as the ratio between these two values.
#' zeros in the outcome, as well as the ratio between these two values.
#'
#' @details If the amount of observed zeros is larger than the amount of
#' predicted zeros, the model is underfitting zeros, which indicates a
#' zero-inflation in the data. In such cases, it is recommended to use
#' negative binomial or zero-inflated models.
#' predicted zeros, the model is underfitting zeros, which indicates a
#' zero-inflation in the data. In such cases, it is recommended to use
#' negative binomial or zero-inflated models.
#'
#' @family functions to check model assumptions and and assess model quality
#'
#' @examples
#' if (require("glmmTMB")) {
Expand Down
14 changes: 14 additions & 0 deletions man/check_autocorrelation.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions man/check_collinearity.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 6322ec4

Please sign in to comment.