Link together the check_* family (#570)

easystats · Apr 4, 2023 · 6322ec4 · 6322ec4
1 parent fabd707
commit 6322ec4
Show file tree

Hide file tree

Showing 24 changed files with 306 additions and 130 deletions.
diff --git a/R/check_autocorrelation.R b/R/check_autocorrelation.R
@@ -11,6 +11,8 @@
 #' @return Invisibly returns the p-value of the test statistics. A p-value < 0.05
 #' indicates autocorrelated residuals.
 #'
+#' @family functions to check model assumptions and and assess model quality
+#'
 #' @details Performs a Durbin-Watson-Test to check for autocorrelated residuals.
 #' In case of autocorrelation, robust standard errors return more accurate
 #' results for the estimates, or maybe a mixed model with error term for the

diff --git a/R/check_collinearity.R b/R/check_collinearity.R
@@ -110,6 +110,8 @@
 #' common statistical problems: Data exploration. Methods in Ecology and
 #' Evolution (2010) 1:3–14.
 #'
+#' @family functions to check model assumptions and and assess model quality
+#'
 #' @note The code to compute the confidence intervals for the VIF and tolerance
 #' values was adapted from the Appendix B from the Marcoulides et al. paper.
 #' Thus, credits go to these authors the original algorithm. There is also

diff --git a/R/check_convergence.R b/R/check_convergence.R
@@ -12,38 +12,39 @@
 #' @return `TRUE` if convergence is fine and `FALSE` if convergence
 #'   is suspicious. Additionally, the convergence value is returned as attribute.
 #'
-#' @details \subsection{Convergence and log-likelihood}{
-#'   Convergence problems typically arise when the model hasn't converged
-#'   to a solution where the log-likelihood has a true maximum. This may result
-#'   in unreliable and overly complex (or non-estimable) estimates and standard
-#'   errors.
-#'   }
-#'   \subsection{Inspect model convergence}{
-#'   **lme4** performs a convergence-check (see `?lme4::convergence`),
-#'   however, as as discussed [here](https://github.com/lme4/lme4/issues/120)
-#'   and suggested by one of the lme4-authors in
-#'   [this comment](https://github.com/lme4/lme4/issues/120#issuecomment-39920269),
-#'   this check can be too strict. `check_convergence()` thus provides an
-#'   alternative convergence test for `merMod`-objects.
-#'   }
-#'   \subsection{Resolving convergence issues}{
-#'   Convergence issues are not easy to diagnose. The help page on
-#'   `?lme4::convergence` provides most of the current advice about
-#'   how to resolve convergence issues. Another clue might be large parameter
-#'   values, e.g. estimates (on the scale of the linear predictor) larger than
-#'   10 in (non-identity link) generalized linear model *might* indicate
-#'   [complete separation](https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logisticprobit-regression-and-how-do-we-deal-with-them/).
-#'   Complete separation can be addressed by regularization, e.g. penalized
-#'   regression or Bayesian regression with appropriate priors on the fixed effects.
-#'   }
-#'   \subsection{Convergence versus Singularity}{
-#'   Note the different meaning between singularity and convergence: singularity
-#'   indicates an issue with the "true" best estimate, i.e. whether the maximum
-#'   likelihood estimation for the variance-covariance matrix of the random effects
-#'   is positive definite or only semi-definite. Convergence is a question of
-#'   whether we can assume that the numerical optimization has worked correctly
-#'   or not.
-#'   }
+#' @section Convergence and log-likelihood:
+#' Convergence problems typically arise when the model hasn't converged
+#' to a solution where the log-likelihood has a true maximum. This may result
+#' in unreliable and overly complex (or non-estimable) estimates and standard
+#' errors.
+#'
+#' @section Inspect model convergence:
+#' **lme4** performs a convergence-check (see `?lme4::convergence`),
+#' however, as as discussed [here](https://github.com/lme4/lme4/issues/120)
+#' and suggested by one of the lme4-authors in
+#' [this comment](https://github.com/lme4/lme4/issues/120#issuecomment-39920269),
+#' this check can be too strict. `check_convergence()` thus provides an
+#' alternative convergence test for `merMod`-objects.
+#'
+#' @section Resolving convergence issues:
+#' Convergence issues are not easy to diagnose. The help page on
+#' `?lme4::convergence` provides most of the current advice about
+#' how to resolve convergence issues. Another clue might be large parameter
+#' values, e.g. estimates (on the scale of the linear predictor) larger than
+#' 10 in (non-identity link) generalized linear model *might* indicate
+#' [complete separation](https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logisticprobit-regression-and-how-do-we-deal-with-them/).
+#' Complete separation can be addressed by regularization, e.g. penalized
+#' regression or Bayesian regression with appropriate priors on the fixed effects.
+#'
+#' @section Convergence versus Singularity:
+#' Note the different meaning between singularity and convergence: singularity
+#' indicates an issue with the "true" best estimate, i.e. whether the maximum
+#' likelihood estimation for the variance-covariance matrix of the random effects
+#' is positive definite or only semi-definite. Convergence is a question of
+#' whether we can assume that the numerical optimization has worked correctly
+#' or not.
+#'
+#' @family functions to check model assumptions and and assess model quality
 #'
 #' @examples
 #' if (require("lme4")) {

diff --git a/R/check_heteroscedasticity.R b/R/check_heteroscedasticity.R
@@ -18,6 +18,8 @@
 #'
 #' @references Breusch, T. S., and Pagan, A. R. (1979) A simple test for heteroscedasticity and random coefficient variation. Econometrica 47, 1287-1294.
 #'
+#' @family functions to check model assumptions and and assess model quality
+#'
 #' @examples
 #' m <<- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)
 #' check_heteroscedasticity(m)

diff --git a/R/check_homogeneity.R b/R/check_homogeneity.R
@@ -18,6 +18,8 @@
 #'
 #' @note There is also a [`plot()`-method](https://easystats.github.io/see/articles/performance.html) implemented in the \href{https://easystats.github.io/see/}{\pkg{see}-package}.
 #'
+#' @family functions to check model assumptions and and assess model quality
+#'
 #' @examples
 #' model <<- lm(len ~ supp + dose, data = ToothGrowth)
 #' check_homogeneity(model)

diff --git a/R/check_model.R b/R/check_model.R
@@ -131,6 +131,8 @@
 #' look at the `check` argument and see if some of the model checks could be
 #' skipped, which also increases performance.
 #'
+#' @family functions to check model assumptions and and assess model quality
+#'
 #' @examples
 #' \dontrun{
 #' m <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)

diff --git a/R/check_multimodal.R b/R/check_multimodal.R
@@ -2,7 +2,7 @@
 #'
 #' For univariate distributions (one-dimensional vectors), this functions
 #' performs a Ameijeiras-Alonso et al. (2018) excess mass test. For multivariate
-#' distributions (dataframes), it uses mixture modelling. However, it seems that
+#' distributions (data frames), it uses mixture modelling. However, it seems that
 #' it always returns a significant result (suggesting that the distribution is
 #' multimodal). A better method might be needed here.
 #'

diff --git a/R/check_outliers.R b/R/check_outliers.R
@@ -34,6 +34,8 @@
 #'   function. Note that the function will (silently) return a vector of `FALSE`
 #'   for non-supported data types such as character strings.
 #'
+#' @family functions to check model assumptions and and assess model quality
+#'
 #' @note There is also a
 #'   [`plot()`-method](https://easystats.github.io/see/articles/performance.html)
 #'   implemented in the

diff --git a/R/check_overdispersion.R b/R/check_overdispersion.R
@@ -16,36 +16,33 @@
 #'   with the mean and, therefore, variance usually (roughly) equals the mean
 #'   value. If the variance is much higher, the data are "overdispersed".
 #'
-#' \subsection{Interpretation of the Dispersion Ratio}{
+#' @section Interpretation of the Dispersion Ratio:
 #' If the dispersion ratio is close to one, a Poisson model fits well to the
 #' data. Dispersion ratios larger than one indicate overdispersion, thus a
 #' negative binomial model or similar might fit better to the data. A p-value <
 #' .05 indicates overdispersion.
-#' }
 #'
-#' \subsection{Overdispersion in Poisson Models}{
+#' @section Overdispersion in Poisson Models:
 #' For Poisson models, the overdispersion test is based on the code from
-#' \cite{Gelman and Hill (2007), page 115}.
-#' }
+#' _Gelman and Hill (2007), page 115_.
 #'
-#' \subsection{Overdispersion in Mixed Models}{
+#' @section Overdispersion in Mixed Models:
 #' For `merMod`- and `glmmTMB`-objects, `check_overdispersion()`
 #' is based on the code in the
 #' [GLMM FAQ](http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html),
 #' section *How can I deal with overdispersion in GLMMs?*. Note that this
 #' function only returns an *approximate* estimate of an overdispersion
 #' parameter, and is probably inaccurate for zero-inflated mixed models (fitted
 #' with `glmmTMB`).
-#' }
 #'
-#' \subsection{How to fix Overdispersion}{
+#' @section How to fix Overdispersion:
 #' Overdispersion can be fixed by either modeling the dispersion parameter, or
 #' by choosing a different distributional family (like Quasi-Poisson, or
-#' negative binomial, see \cite{Gelman and Hill (2007), pages 115-116}).
-#' }
+#' negative binomial, see _Gelman and Hill (2007), pages 115-116_).
 #'
-#' @references
+#' @family functions to check model assumptions and and assess model quality
 #'
+#' @references
 #' - Bolker B et al. (2017):
 #'  [GLMM FAQ.](http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html)
 #'

diff --git a/R/pp_check.R → R/check_predictions.R b/R/pp_check.R → R/check_predictions.R
@@ -36,10 +36,12 @@
 #'   similar to the observed outcome than the model in the left panel (a). Thus,
 #'   model (b) is likely to be preferred over model (a).
 #'
-#' @note  Every model object that has a `simulate()`-method should work with
-#'   `check_predictions()`. On R 3.6.0 and higher, if **bayesplot** (or a
-#'   package that imports **bayesplot** such as **rstanarm** or **brms**)
-#'   is loaded, `pp_check()` is also available as an alias for `check_predictions()`.
+#' @note Every model object that has a `simulate()`-method should work with
+#' `check_predictions()`. On R 3.6.0 and higher, if **bayesplot** (or a
+#' package that imports **bayesplot** such as **rstanarm** or **brms**)
+#' is loaded, `pp_check()` is also available as an alias for `check_predictions()`.
+#'
+#' @family functions to check model assumptions and and assess model quality
 #'
 #' @references
 #' - Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., and Gelman, A. (2019).
@@ -51,6 +53,7 @@
 #'
 #' - Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and
 #'   Rubin, D. B. (2014). Bayesian data analysis. (Third edition). CRC Press.
+#'
 #' - Gelman, A., Hill, J., and Vehtari, A. (2020). Regression and Other Stories.
 #'   Cambridge University Press.
 #'
@@ -80,7 +83,7 @@ check_predictions.default <- function(object,
   .is_model_valid(object)
 
   if (isTRUE(insight::model_info(object, verbose = FALSE)$is_bayesian) &&
-    isFALSE(inherits(object, "BFBayesFactor"))) {
+        isFALSE(inherits(object, "BFBayesFactor"))) {
     insight::check_if_installed(
       "bayesplot",
       "to create posterior prediction plots for Stan models"

diff --git a/R/check_singularity.R b/R/check_singularity.R
@@ -12,39 +12,39 @@
 #' @return `TRUE` if the model fit is singular.
 #'
 #' @details If a model is "singular", this means that some dimensions of the
-#'   variance-covariance matrix have been estimated as exactly zero. This
-#'   often occurs for mixed models with complex random effects structures.
-#'   \cr \cr
-#'   \dQuote{While singular models are statistically well defined (it is
-#'   theoretically sensible for the true maximum likelihood estimate to
-#'   correspond to a singular fit), there are real concerns that (1) singular
-#'   fits correspond to overfitted models that may have poor power; (2) chances
-#'   of numerical problems and mis-convergence are higher for singular models
-#'   (e.g. it may be computationally difficult to compute profile confidence
-#'   intervals for such models); (3) standard inferential procedures such as
-#'   Wald statistics and likelihood ratio tests may be inappropriate.}
-#'   (\cite{lme4 Reference Manual})
-#'   \cr \cr
-#'   There is no gold-standard about how to deal with singularity and which
-#'   random-effects specification to choose. Beside using fully Bayesian methods
-#'   (with informative priors), proposals in a frequentist framework are:
-#'  -ize{
-#'  - avoid fitting overly complex models, such that the
-#'   variance-covariance matrices can be estimated precisely enough
-#'   (\cite{Matuschek et al. 2017})
-#'  - use some form of model selection to choose a model that balances
-#'   predictive accuracy and overfitting/type I error (\cite{Bates et al. 2015},
-#'   \cite{Matuschek et al. 2017})
-#'  - \dQuote{keep it maximal}, i.e. fit the most complex model consistent
-#'   with the experimental design, removing only terms required to allow a
-#'   non-singular fit (\cite{Barr et al. 2013})
-#'   }
-#'   Note the different meaning between singularity and convergence: singularity
-#'   indicates an issue with the "true" best estimate, i.e. whether the maximum
-#'   likelihood estimation for the variance-covariance matrix of the random
-#'   effects is positive definite or only semi-definite. Convergence is a
-#'   question of whether we can assume that the numerical optimization has
-#'   worked correctly or not.
+#' variance-covariance matrix have been estimated as exactly zero. This
+#' often occurs for mixed models with complex random effects structures.
+#'
+#' "While singular models are statistically well defined (it is theoretically
+#' sensible for the true maximum likelihood estimate to correspond to a singular
+#' fit), there are real concerns that (1) singular fits correspond to overfitted
+#' models that may have poor power; (2) chances of numerical problems and
+#' mis-convergence are higher for singular models (e.g. it may be computationally
+#' difficult to compute profile confidence intervals for such models); (3)
+#' standard inferential procedures such as Wald statistics and likelihood ratio
+#' tests may be inappropriate." (_lme4 Reference Manual_)
+#'
+#' There is no gold-standard about how to deal with singularity and which
+#' random-effects specification to choose. Beside using fully Bayesian methods
+#' (with informative priors), proposals in a frequentist framework are:
+#'
+#' - avoid fitting overly complex models, such that the variance-covariance
+#'   matrices can be estimated precisely enough (_Matuschek et al. 2017_)
+#' - use some form of model selection to choose a model that balances
+#'   predictive accuracy and overfitting/type I error (_Bates et al. 2015_,
+#'   _Matuschek et al. 2017_)
+#' - "keep it maximal", i.e. fit the most complex model consistent with the
+#'   experimental design, removing only terms required to allow a non-singular
+#'   fit (_Barr et al. 2013_)
+#'
+#' Note the different meaning between singularity and convergence: singularity
+#' indicates an issue with the "true" best estimate, i.e. whether the maximum
+#' likelihood estimation for the variance-covariance matrix of the random
+#' effects is positive definite or only semi-definite. Convergence is a
+#' question of whether we can assume that the numerical optimization has
+#' worked correctly or not.
+#'
+#' @family functions to check model assumptions and and assess model quality
 #'
 #' @references
 #' - Bates D, Kliegl R, Vasishth S, Baayen H. Parsimonious Mixed Models.

diff --git a/R/check_zeroinflation.R b/R/check_zeroinflation.R
@@ -2,22 +2,24 @@
 #' @name check_zeroinflation
 #'
 #' @description `check_zeroinflation()` checks whether count models are
-#'   over- or underfitting zeros in the outcome.
+#' over- or underfitting zeros in the outcome.
 #'
-#' @param x Fitted model of class `merMod`, `glmmTMB`, `glm`,
-#'    or `glm.nb` (package \pkg{MASS}).
+#' @param x Fitted model of class `merMod`, `glmmTMB`, `glm`, or `glm.nb`
+#' (package **MASS**).
 #' @param tolerance The tolerance for the ratio of observed and predicted
-#'    zeros to considered as over- or underfitting zeros. A ratio
-#'    between 1 +/- `tolerance` is considered as OK, while a ratio
-#'    beyond or below this threshold would indicate over- or underfitting.
+#'  zeros to considered as over- or underfitting zeros. A ratio
+#'  between 1 +/- `tolerance` is considered as OK, while a ratio
+#'  beyond or below this threshold would indicate over- or underfitting.
 #'
 #' @return A list with information about the amount of predicted and observed
-#'    zeros in the outcome, as well as the ratio between these two values.
+#'  zeros in the outcome, as well as the ratio between these two values.
 #'
 #' @details If the amount of observed zeros is larger than the amount of
-#'   predicted zeros, the model is underfitting zeros, which indicates a
-#'   zero-inflation in the data. In such cases, it is recommended to use
-#'   negative binomial or zero-inflated models.
+#' predicted zeros, the model is underfitting zeros, which indicates a
+#' zero-inflation in the data. In such cases, it is recommended to use
+#' negative binomial or zero-inflated models.
+#'
+#' @family functions to check model assumptions and and assess model quality
 #'
 #' @examples
 #' if (require("glmmTMB")) {

diff --git a/man/check_autocorrelation.Rd b/man/check_autocorrelation.Rd
diff --git a/man/check_collinearity.Rd b/man/check_collinearity.Rd