-
-
Notifications
You must be signed in to change notification settings - Fork 19
/
Copy pathREADME.Rmd
408 lines (278 loc) · 16.2 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
---
output:
github_document:
toc: false
fig_width: 10.08
fig_height: 6
tags: [r, prediction, estimation, marginal]
vignette: >
%\VignetteIndexEntry{README}
\usepackage[utf8]{inputenc}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
# modelbased <img src='man/figures/logo.png' align="right" height="139" />
```{r, echo = FALSE, warning=FALSE, message=FALSE}
library(ggplot2)
knitr::opts_chunk$set(
collapse = TRUE,
dpi = 150,
fig.path = "man/figures/",
warning = FALSE,
message = FALSE,
out.width = "100%"
)
```
[![publication](https://img.shields.io/badge/Cite-Unpublished-yellow)](https://github.com/easystats/modelbased/blob/master/inst/CITATION)
***Taking your models to new heights***
---
`modelbased` is a package helping with model-based estimations, to easily compute of marginal means, contrast analysis and model predictions.
## Installation
[![CRAN](http://www.r-pkg.org/badges/version/modelbased)](https://cran.r-project.org/package=modelbased) [![modelbased status badge](https://easystats.r-universe.dev/badges/modelbased)](https://easystats.r-universe.dev)
[![codecov](https://codecov.io/gh/easystats/modelbased/branch/master/graph/badge.svg)](https://app.codecov.io/gh/easystats/modelbased)
The *modelbased* package is available on CRAN, while its latest development version is available on R-universe (from _rOpenSci_).
Type | Source | Command
---|---|---
Release | CRAN | `install.packages("modelbased")`
Development | R-universe | `install.packages("modelbased", repos = "https://easystats.r-universe.dev")`
Once you have downloaded the package, you can then load it using:
```{r, eval=FALSE}
library("modelbased")
```
> **Tip**
>
> **Instead of `library(modelbased)`, use `library(easystats)`.**
> **This will make all features of the easystats-ecosystem available.**
>
> **To stay updated, use `easystats::install_latest()`.**
## Documentation
Access the package [**documentation**](https://easystats.github.io/modelbased/), and check-out these vignettes:
- [**Visualisation matrix**](https://easystats.github.io/modelbased/articles/visualisation_matrix.html)
- [**Marginal means**](https://easystats.github.io/modelbased/articles/estimate_means.html)
- [**Contrast analysis**](https://easystats.github.io/modelbased/articles/estimate_contrasts.html)
- [**Marginal effects**](https://easystats.github.io/modelbased/articles/estimate_slopes.html)
- [**Use a model to make predictions**](https://easystats.github.io/modelbased/articles/estimate_response.html)
- [**Describe non-linear curves**](https://easystats.github.io/modelbased/articles/describe_nonlinear.html)
- [**Interpret models using Effect Derivatives**](https://easystats.github.io/modelbased/articles/derivatives.html)
- [**Estimate and re-use random effects**](https://easystats.github.io/modelbased/articles/estimate_grouplevel.html)
- [**The modelisation approach**](https://easystats.github.io/modelbased/articles/modelisation_approach.html)
# Features
The package is built around 5 main functions:
- [`estimate_means()`](https://easystats.github.io/modelbased/reference/estimate_means.html): Estimates the average values at each factor levels
- [`estimate_contrasts()`](https://easystats.github.io/modelbased/reference/estimate_contrasts.html): Estimates and tests contrasts between different factor levels
- [`estimate_slopes()`](https://easystats.github.io/modelbased/reference/estimate_slopes.html): Estimates the slopes of numeric predictors at different factor levels or alongside a numeric predictor
- [`estimate_expectation()`](https://easystats.github.io/modelbased/articles/estimate_response.html): Predict the response variable using the model
These functions are powered by the [`insight::get_datagrid()`](https://easystats.github.io/insight/reference/get_datagrid.html) function, a smart tool for guessing the appropriate reference grid.
## Create smart grids to represent complex interactions
- **Problem**: I want to graphically represent the interaction between two continuous variable. On top of that, I would like to express one of them in terms of standardized change (i.e., standard deviation relative to the mean).
- **Solution**: Create a data grid following the desired specifications, and feed it to the model to obtain predictions. Format some of the columns for better readability, and plot using `ggplot`.
Check-out [**this vignette**](https://easystats.github.io/modelbased/articles/visualisation_matrix.html) for a detailed walkthrough on *visualisation matrices*.
```{r}
library(ggplot2)
library(see)
library(modelbased)
# 1. Fit model and get visualization matrix
model <- lm(Sepal.Length ~ Petal.Length * Petal.Width, data = iris)
# 2. Create a visualisation matrix with expected Z-score values of Petal.Width
vizdata <- insight::get_datagrid(model, by = c("Petal.Length", "Petal.Width = c(-1, 0, 1)"))
# 3. Revert from expected SD to actual values
vizdata <- unstandardize(vizdata, select = "Petal.Width")
# 4. Add predicted relationship from the model
vizdata <- modelbased::estimate_expectation(vizdata)
# 5. Express Petal.Width as z-score ("-1 SD", "+2 SD", etc.)
vizdata$Petal.Width <- effectsize::format_standardize(vizdata$Petal.Width, reference = iris$Petal.Width)
# 6. Plot
ggplot(iris, aes(x = Petal.Length, y = Sepal.Length)) +
# Add points from original dataset (only shapes 21-25 have a fill aesthetic)
geom_point2(aes(fill = Petal.Width), shape = 21, size = 5) +
# Add relationship lines
geom_line(data = vizdata, aes(y = Predicted, color = Petal.Width), linewidth = 1) +
# Improve colors / themes
scale_color_viridis_d(direction = -1) +
scale_fill_viridis_c(guide = "none") +
theme_modern()
```
## Estimate marginal means
- **Problem**: My model has a factor as a predictor, and the parameters only return the difference between levels and the intercept. I want to see the values *at* each factor level.
- **Solution**: Estimate model-based means ("marginal means"). You can visualize them by plotting their confidence interval and the original data.
Check-out [**this vignette**](https://easystats.github.io/modelbased/articles/estimate_means.html) for a detailed walkthrough on *marginal means*.
```{r}
# 1. The model
model <- lm(Sepal.Width ~ Species, data = iris)
# 2. Obtain estimated means
means <- estimate_means(model)
means
# 3. Plot
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
# Add base data
geom_violin(aes(fill = Species), color = "white") +
geom_jitter2(width = 0.05, alpha = 0.5) +
# Add pointrange and line from means
geom_line(data = means, aes(y = Mean, group = 1), linewidth = 1) +
geom_pointrange(
data = means,
aes(y = Mean, ymin = CI_low, ymax = CI_high),
size = 1,
color = "white"
) +
# Improve colors
scale_fill_material() +
theme_modern()
```
## Contrast analysis
- **Problem**: The parameters of my model only return the difference between some of the factor levels and the intercept. I want to see the differences between each levels, as I would do with post-hoc comparison tests in ANOVAs.
- **Solution**: Estimate model-based contrasts ("marginal contrasts"). You can visualize them by plotting their confidence interval.
Check-out [**this vignette**](https://easystats.github.io/modelbased/articles/estimate_contrasts.html) for a detailed walkthrough on *contrast analysis*.
```{r}
# 1. The model
model <- lm(Sepal.Width ~ Species, data = iris)
# 2. Estimate marginal contrasts
contrasts <- estimate_contrasts(model)
contrasts
```
```{r, echo = FALSE}
library(see)
plot(contrasts, estimate_means(model)) +
theme_modern()
```
## Check the contrasts at different points of another linear predictor
- **Problem**: In the case of an interaction between a factor and a continuous variable, you might be interested in computing how the differences between the factor levels (the contrasts) change depending on the other continuous variable.
- **Solution**: You can estimate the marginal contrasts at different values of a continuous variable (the *modulator*), and plot these differences (they are significant if their 95\% CI doesn't cover 0).
```{r}
model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris)
estimate_contrasts(model, by = "Petal.Length", length = 3)
```
```{r}
# Recompute contrasts with a higher precision (for a smoother plot)
contrasts <- estimate_contrasts(model, by = "Petal.Length", length = 20)
# Add Contrast column by concatenating
contrasts$Contrast <- paste(contrasts$Level1, "-", contrasts$Level2)
# Plot
ggplot(contrasts, aes(x = Petal.Length, y = Difference, )) +
# Add line and CI band
geom_line(aes(color = Contrast)) +
geom_ribbon(aes(ymin = CI_low, ymax = CI_high, fill = Contrast), alpha = 0.2) +
# Add line at 0, indicating no difference
geom_hline(yintercept = 0, linetype = "dashed") +
# Colors
theme_modern()
```
## Generate predictions from your model to compare it with original data
- **Problem**: You fitted different models, and you want to intuitively visualize how they compare in terms of fit quality and prediction accuracy, so that you don't only rely on abstract indices of performance.
- **Solution**: You can predict the response variable from different models and plot them against the original true response. The closest the points are on the identity line (the diagonal), the closest they are from a perfect fit.
Check-out [**this vignette**](https://easystats.github.io/modelbased/articles/estimate_response.html) for a detailed walkthrough on *predictions*.
```{r}
# Fit model 1 and predict the response variable
model1 <- lm(Petal.Length ~ Sepal.Length, data = iris)
pred1 <- estimate_expectation(model1)
pred1$Petal.Length <- iris$Petal.Length # Add true response
# Print first 5 lines of output
head(pred1, n = 5)
# Same for model 2
model2 <- lm(Petal.Length ~ Sepal.Length * Species, data = iris)
pred2 <- estimate_expectation(model2)
pred2$Petal.Length <- iris$Petal.Length
# Initialize plot for model 1
ggplot(data = pred1, aes(x = Petal.Length, y = Predicted)) +
# with identity line (diagonal) representing perfect predictions
geom_abline(linetype = "dashed") +
# Add the actual predicted points of the models
geom_point(aes(color = "Model 1")) +
geom_point(data = pred2, aes(color = "Model 2")) +
# Aesthetics changes
labs(y = "Petal.Length (predicted)", color = NULL) +
scale_color_manual(values = c("Model 1" = "blue", "Model 2" = "red")) +
theme_modern()
```
## Extract and format group-level random effects
- **Problem**: You have a mixed model and you would like to easily access the random part, i.e., the group-level effects (e.g., the individuals scores).
- **Solution**: You can apply `estimate_grouplevel` on a mixed model.
See [**this vignette**](https://easystats.github.io/modelbased/articles/estimate_grouplevel.html) for more information.
```{r}
library(lme4)
model <- lmer(mpg ~ drat + (1 + drat | cyl), data = mtcars)
random <- estimate_grouplevel(model)
random
plot(random)
```
<!-- TODO: add plotting example once 'see' on cran -->
## Estimate derivative of non-linear relationships (e.g., in GAMs)
- **Problem**: You model a non-linear relationship using polynomials, splines or GAMs. You want to know which parts of the curve are significant positive or negative trends.
- **Solution**: You can estimate the *derivative* of smooth using `estimate_slopes`.
The two plots below represent the modeled (non-linear) effect estimated by the model, i.e., the relationship between the outcome and the predictor, as well as the "trend" (or slope) of that relationship at any given point. You can see that whenever the slope is negative, the effect is below 0, and vice versa, with some regions of the effect being significant (i.e., positive or negative with enough confidence) while the others denote regions where the relationship is rather flat.
Check-out [**this vignette**](https://easystats.github.io/modelbased/articles/estimate_slopes.html) for a detailed walkthrough on *marginal effects*.
<!-- TODO: currently fails with emmeans 1.8.0 //-->
```{r, fig.height=10, fig.width=8, eval=FALSE}
# Fit a non-linear General Additive Model (GAM)
model <- mgcv::gam(Sepal.Width ~ s(Petal.Length), data = iris)
# 1. Compute derivatives
deriv <- estimate_slopes(model,
trend = "Petal.Length",
by = "Petal.Length",
length = 100
)
# 2. Visualize predictions and derivative
see::plots(
plot(estimate_relation(model)),
plot(deriv),
n_rows = 2
)
```
## Describe the smooth term by its linear parts
- **Problem**: You model a non-linear relationship using polynomials, splines or GAMs. You want to describe it in terms of linear parts: where does it decrease, how much, where does it increase, etc.
- **Solution**: You can apply `describe_nonlinear` on a predicted relationship that will return the different parts of increase and decrease.
See [**this vignette**](https://easystats.github.io/modelbased/articles/describe_nonlinear.html) for more information.
```{r}
model <- lm(Sepal.Width ~ poly(Petal.Length, 2), data = iris)
# 1. Visualize
vizdata <- estimate_relation(model, length = 30)
ggplot(vizdata, aes(x = Petal.Length, y = Predicted)) +
geom_ribbon(aes(ymin = CI_low, ymax = CI_high), alpha = 0.3) +
geom_line() +
# Add original data points
geom_point(data = iris, aes(x = Petal.Length, y = Sepal.Width)) +
# Aesthetics
theme_modern()
# 2. Describe smooth line
describe_nonlinear(vizdata, x = "Petal.Length")
```
## Plot all posterior draws for Bayesian models predictions
See [**this vignette**](https://easystats.github.io/modelbased/articles/estimate_response.html) for a walkthrough on how to do that.
```{r echo=FALSE, fig.align='center', out.width="80%"}
knitr::include_graphics("https://raw.githubusercontent.com/easystats/modelbased/refs/heads/main/man/figures/gganimate_figure.gif")
```
<!-- Needs to be re-implemented after revision of visualisation_recipe()
## Understand interactions between two continuous variables
Also referred to as **Johnson-Neyman intervals**, this plot shows how the effect (the "slope") of one variable varies depending on another variable. It is useful in the case of complex interactions between continuous variables.
For instance, the plot below shows that the effect of `hp` (the y-axis) is significantly negative only when `wt` is low (`< ~4`).
```{r}
model <- lm(mpg ~ hp * wt, data = mtcars)
slopes <- estimate_slopes(model, trend = "hp", by = "wt")
plot(slopes)
```
-->
## Visualize predictions with random effects
Aside from plotting the coefficient of each random effect (as done [here](https://github.com/easystats/modelbased#extract-and-format-group-level-random-effects)), we can also visualize the predictions of the model for each of these levels, which can be useful to diagnostic or see how they contribute to the fixed effects. We will do that by making predictions with `estimate_relation` and setting `include_random` to `TRUE`.
Let's model the reaction time with the number of days of sleep deprivation as fixed effect and the participants as random intercept.
```{r}
library(lme4)
model <- lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
preds <- estimate_relation(model, include_random = TRUE)
plot(preds, ribbon = list(alpha = 0)) # Make CI ribbon transparent for clarity
```
As we can see, each participant has a different "intercept" (starting point on the y-axis), but all their slopes are the same: this is because the only slope is the "general" one estimated across all participants by the fixed effect. Let's address that and allow the slope to vary for each participant too.
```{r}
model <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
preds <- estimate_relation(model, include_random = TRUE)
plot(preds, ribbon = list(alpha = 0.1))
```
As we can see, the effect is now different for all participants. Let's plot, on top of that, the "fixed" effect estimated across all these individual effects.
```{r}
fixed_pred <- estimate_relation(model) # This time, include_random is FALSE (default)
plot(preds, ribbon = list(alpha = 0)) + # Previous plot
geom_ribbon(data = fixed_pred, aes(x = Days, ymin = CI_low, ymax = CI_high), alpha = 0.4) +
geom_line(data = fixed_pred, aes(x = Days, y = Predicted), linewidth = 2)
```
## Code of Conduct
Please note that the modelbased project is released with a [Contributor Code of Conduct](https://easystats.github.io/modelbased/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.