-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathanova.Rmd
593 lines (453 loc) · 20.5 KB
/
anova.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
---
title: 'Anova'
---
# Anova
```{r, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse=TRUE, cache=TRUE, message=F)
library(tidyverse)
library(pander)
library(ez)
library(lmerTest)
theme_set(theme_minimal())
```
[Be sure to read the [section on linear models in R](#linear-models-simple)
_before_ you read this section, and specifically the parts on
[specifying models with formulae](#formulae).]{.admonition}
This section attempts to cover in a high level way how to specify anova models
in R and some of the issues in interpreting the model output. If you need to
revise the basic idea of an Anova, the Howell textbook [@howell2016fundamental].
For a very quick reminder,
[this interactive/animated explanation of Anova is helpful](http://web.utah.edu/stat/introstats/anovaflash.html).
If you just want the 'answers' — i.e. the syntax to specify common Anova models
-- you could skip to the next section: [Anova cookbook](#anova-cookbook)
There are 4 rules for doing Anova in R and not wanting to cry:
1. Keep your data in 'long' format.
2. Know the
[differences between character, factor and numeric](#factors-and-numerics)
variables
3. Do not use the `aov()` or `anova()` functions to get an Anova table unless
you know what you are doing.
4. Learn about the types of sums of squares and always remember to specify
`type=3`, unless you know better.
### Rules for using Anova in R {-}
#### Rule 1: Use long format data {-}
In R, data are almost always most useful a long format where:
- each row of the dataframe corresponds to a single measurement occasion
- each column corresponds to a variable which is measured
```{r include=F}
df <- expand.grid(time=1:3, person=1:10) %>%
as_data_frame() %>%
mutate(
outcome = rpois(n(), 10)
) %>%
group_by(person) %>%
mutate(predictor = rpois(1, 3)) %>%
arrange(person, time) %>%
select(person, time, predictor, outcome)
df.wide <- df %>%
mutate(time = paste("Time", time)) %>%
group_by(person, predictor) %>%
spread(time, outcome)
df.wide
```
For example, in R we will have data like this:
```{r}
df %>%
head %>%
pander
```
Whereas in SPSS we might have the same data structured like this:
```{r}
df.wide %>%
head %>%
pander
```
R always uses long form data when running an Anova, but one downside is that it
therefore has no automatic to know which rows belong to which person (assuming
individual people are the unit of error in your model). This means that for
repeated measures designs you need to make explicit which measures are repeated
when specifying the model (see the section on repeated designs below).
#### Rule 2: Know your variables {-}
See [the section on dataframes](#datasets-dataframes) and
[on the different column types](#factors-and-numerics) and be sure you can
distinguish:
- Numeric variables
- Factors
- Character strings.
In Anova:
- Outcomes will be numeric variables
- Predictors will be factors or (preferably) character strings
If you want to run Ancova models, you can also add numeric predictors.
#### Rule 3: Don't use `aov()` or `anova()` {-}
This is the most important rule of all.
The `aov` and `anova` functions have been around in R a long time. For various
historical reasons the defaults for these functions won't do what you expect if
you are used to SPSS, Stata, SAS, and most other stats packages. These
differences are important and will be confusing and give you misleading results
unless you understand them.
The recommendation here is:
- If you have a factorial experiment define your model using `lm()` and then
use `car::Anova()` to calculate F tests.
- If you have repeated measures, your data are perfectly balanced, and you
have no missing values then [use `afex::car_aov()`](#repeated-measures).
- If you think you want a repeated measures Anova but your data are not
balanced, or you have missing data, use
[linear mixed models](#multilevel-models) instead via the `lme4::` package.
#### Rule 4: Use type 3 sums of squares (and learn why) {- #sums-squares}
You may be aware, but there are at least 3 different ways of calculating the
sums of squares for each factor and interaction in an Anova. In short,
- SPSS and most other packages use type 3 sums of squares.
- `aov` and `anova` use type 1.
- By default, `car::Anova` and `ez::ezANOVA` use type 2, but can use type 3 if
you ask.
This means you must:
- Make sure you use type 3 sums of squares unless you have a reason not to.
- Always pass `type=3` as an argument when running an Anova.
##### {- .explainer}
A longer explanation of _why_ you probably want type 3 sums of squares is given
in this
[online discussion on stats.stackechange.com](https://stats.stackexchange.com/questions/60362/choice-between-type-i-type-ii-or-type-iii-anova)
and practical implications are shown in
[this worked example](http://dwoll.de/rexrepos/posts/anovaSStypes.html).
An even longer answer, including a much deeper exploration of the philosophical
questions involved is given by @venables1998exegeses.
### Recommendations for doing Anova {- #anova-recommendations}
1. Make sure to [Plot your raw data _first_](#graphics)
1. Where you have interactions,
[be cautious in interpreting the main effects in your model, and always plot the model predictions](#understanding-interactions).
1. If you find yourself aggregating (averaging) data before running your model,
[think about using a mixed or multilevel model](#multilevel-models) instead.
1. If you are using repeated measures Anova,
[check if you should should be using a mixed model](#multilevel-models)
instead. If you have an unbalanced design or any missing data, you probably
should use a mixed model.
## Anova 'Cookbook' {- #anova-cookbook}
This section is intended as a shortcut to running Anova for a variety of common
types of model. If you want to understand more about what you are doing, read
the section on [principles of Anova in R first](#anova), or consult an
introductory text on Anova which covers Anova [e.g. @howell2012statistical].
### Between-subjects Anova {-}
#### Oneway Anova (> 2 groups) {- #oneway-anova}
If your design has more than 2 groups then you should use oneway Anova.
Let's say we asked people to taste 1 of 4 fruit juices, and rate how tasty it
was on a scale from 0 to 10:
```{r, echo=F}
set.seed(12345)
tasty.juice <- expand.grid(person=1:25, juice = c("Mango", "Apple", "Orange", "Durian")) %>%
mutate(tastiness = round(rnorm(n(), 5, 2)) + -3 * as.numeric(juice=="Durian"), person=row_number()) %>%
rowwise() %>% mutate(tastiness = min(max(tastiness, 0), 10))
saveRDS(tasty.juice, file='data/juice.RDS')
tasty.juice %>%
ggplot(aes(juice, tastiness)) +
geom_boxplot()
```
We can run a oneway Anova with [type 3 sums of squares](#sums-squares) using the
`Anova` function from the `car::` package:
```{r}
juice.lm <- lm(tastiness ~ juice, data=tasty.juice)
juice.anova <- car::Anova(juice.lm, type=3)
juice.anova
```
And we could [compute the contasts](#contrasts) for each fruit against the
others (the grand mean):
```{r}
juice.lsm <- emmeans::emmeans(juice.lm, pairwise~juice, adjust="fdr")
juice.contrasts <- emmeans::contrast(juice.lsm, "eff")
juice.contrasts$contrasts
```
[We found a significant main effect of juice, `r
apastats::describe.Anova(juice.anova, 2)`. Followup tests (adjusted for false
discovery rate) indicated that only Durian differed from the other juices, and
was rated a significantly less tasty Mango, Apple, and Orange
juice.]{.apa-example}
#### Factorial Anova {- #howell-factorial-example}
We are using a
[dataset from Howell](http://www.uvm.edu/~dhowell/methods7/DataFiles/Tab13-2.dat)
[@howell2012statistical, chapter 13]: an experiment which recorded `Recall`
among young v.s. older adults (`Age`) for each of 5 conditions.
```{r include=F, eval=F}
eysenck <- read.table('howell-data/Tab13-2.dat', header=T) %>%
mutate(
Condition=factor(Condition,
labels=c("Counting", "Rhyming", "Adjective", "Imagery", "Intention")),
Age = factor(Age, labels=c("Young", "Older")))
saveRDS(eysenck, file="data/eysenck.Rdata")
```
These data would commonly be plotted something like this:
```{r}
eysenck <- readRDS("data/eysenck.Rdata")
eysenck %>%
ggplot(aes(Condition, Recall, group=Age, color=Age)) +
stat_summary(geom="pointrange", fun.data = mean_cl_boot) +
ylab("Recall (95% CI)") +
xlab("")
```
[Visual inspection of the data (see Figure X) suggested that older adults
recalled more words than younger adults, and that this difference was greatest
for the intention, imagery, and adjective conditions. Recall peformance was
worst in the counting and rhyming conditions.]{.apa-example}
Or alternatively if we wanted to provde a better summary of the distribution of
the raw data we could use a boxplot:
```{r, fig.cap="Boxplot for recall in older and young adults, by condition."}
eysenck %>%
ggplot(aes(Age, Recall)) +
geom_boxplot(width=.33) +
facet_grid(~Condition) +
ylab("Recall (95% CI)") +
xlab("")
```
We can run a linear model including the effect of `Age` and `Condition` and the
interaction of these variables, and calculate the Anova:
```{r}
eysenck.model <- lm(Recall~Age*Condition, data=eysenck)
car::Anova(eysenck.model, type=3)
```
### Repeated measures or 'split plot' designs {- #repeated-measures}
It might be controversial to say so, but the tools to run traditional repeat
measures Anova in R are a bit of a pain to use. Although there are numerous
packages simplify the process a little, their syntax can be obtuse or confusing.
To make matters worse, various textbooks, online guides and the R help files
themselves show many ways to achieve the same ends, and it can be difficult to
follow the differences between the underlying models that are run.
At this point, given the
[many other advantages of linear mixed models over traditional repeated measures Anova](http://jamanetwork.com/journals/jamapsychiatry/article-abstract/481967),
and given that many researchers abuse traditional Anova in practice (e.g. using
it for unbalanced data, or where some data are missing), the recommendation here
is to simply give up and learn how to run linear mixed models. These can (very
closely) replicate traditional Anova approaches, but also:
- Handle missing data or unbalanced designs gracefully and efficiently.
- Be expanded to include multiple levels of nesting. For example, allowing
pupils to be nested within classes, within schools. Alternatively multiple
measurements of individual patients might be clustered by hospital or
therapist.
- Allow time to be treated as a continuous variable. For example, time can be
modelled as a slope or some kind of curve, rather than a fixed set of
observation-points. This can be more parsimonious, and more flexible when
dealing with real-world data (e.g. from clinical trials).
It would be best at this point to
[jump straight to the main section multilevel or mixed-effects models](#multilevel-models),
but to give one brief example of mixed models in use:
#### {- #sleepstudy-rmanova}
The `sleepstudy` dataset in the `lme4` package provides reaction time data
recorded from participants over a period of 10 days, during which time they were
deprived of sleep.
```{r}
lme4::sleepstudy %>%
head(12) %>%
pander
```
We can plot these data to show the increase in RT as sleep deprivation
continues:
```{r}
lme4::sleepstudy %>%
ggplot(aes(factor(Days), Reaction)) +
geom_boxplot() +
xlab("Days") + ylab("RT (ms)") +
geom_label(aes(y=400, x=2, label="you start to\nfeel bad here"), color="red") +
geom_label(aes(y=450, x=9, label="imagine how bad\nyou feel by this point"), color="red")
```
If we want to test whether there are significant differences in RTs between
`Days`, we could fit something very similar to a traditional repeat measures
Anova using the `lme4::lmer()` function, and obtain an Anova table for the model
using the special `anova()` function which is added by the lmerTest package:
```{r}
sleep.model <- lmer(Reaction ~ factor(Days) + (1 | Subject), data=lme4::sleepstudy)
anova(sleep.model)
```
### Traditional repeated measures Anova {- #trad-rm-anova}
If you really need to fit the traditional repeated measures Anova (e.g. your
supervisor/reviewer has asked you to) then you should use either the `afex::` or
`ez::` packages.
Let's say we have an experiment where we record reaction 25 times (`Trial`)
before and after (`Time` = {1, 2}) one of 4 experimental manipulations
(`Condition` = {1,2,3,4}). You have 12 participants in each condition and no
missing data:
```{r, include=F}
set.seed(1234)
goodrts <- data_frame(RT=sn::rsn(1e5, xi=200, omega = 75, alpha = 3)) %>% filter(RT>40)
expt.data <- expand.grid(Condition=factor(1:4),
trial = 1:25,
time=factor(1:2), person=1:12) %>%
mutate(person = factor(as.numeric(factor(paste(Condition, person))))) %>%
mutate(RT = sample(goodrts$RT, n())) %>%
mutate(
RT = RT + -20*(Condition==2&time==2),
RT = RT + 20*(Condition==3&time==2),
RT = RT + 30*(Condition==3&time==2),
RT = RT + rnorm(n(), 0, 50) * (time==2),
RT = RT - (10*sqrt(trial) + rnorm(n(), 0, 5)) * (Condition == 2|Condition==4)
)
saveRDS(expt.data, file="data/expt.data.RDS")
```
```{r}
expt.data %>%
ggplot(aes(Condition, RT)) +
geom_boxplot() +
facet_wrap(~paste("Time", time))
```
We want to use our repeated measurements before and after the experimental
interventions to increase the precision of our estimate of the between-condition
differences.
Our first step is to aggregate RTs for the multiple trials, taking the mean
across all trials at a particular `time`:
```{r}
expt.data.agg <- expt.data %>%
group_by(Condition, person, time) %>%
summarise(RT=mean(RT))
head(expt.data.agg)
```
Because our data are still in long form (we have two rows per person), we have
to explicitly tell R that `time` is a within subject factor. Using the `afex::`
package we would write:
```{r}
expt.afex <- afex::aov_car(RT ~ Condition * time + Error(person/time),
data=expt.data.agg)
expt.afex$anova_table %>%
pander(caption="`afex::aov_car` output.")
```
Using the `ez::` package we would write:
```{r}
expt.ez <- ez::ezANOVA(data=expt.data.agg,
dv = RT,
wid = person,
within = time,
between = Condition)
expt.ez$ANOVA %>%
pander(caption="`ez::ezANOVA` output.")
```
[These are the same models: any differences in the output are simply due to
rounding. You should use whichever of `ez::` and `afex::` you find easiest to
understand]{.admonition}
The `ges` column is the generalised eta squared effect-size measure, which is
preferable to the partial eta-squared reported by SPSS
[@bakeman2005recommended].
#### But what about [insert favourite R package for Anova]? {- .explainer}
Lots of people like `ez::ezANOVA` and other similar packages. My problem with
`ezANOVA` is that it doesn't use formulae to define the model and for this
reason encourages students to think of Anova as something magical and separate
from linear models and regression in general.
This guide is called 'just enough R', so I've mostly chosen to show only
`car::Anova` because I find this the most coherent method to explain. Using
formulae to specify the model reinforces a technique which is useful in many
other contexts. I've make an exception for repeated because many people find
specifying the error structure explicitly confusing and hard to get right, and
so `ez::` may be the best option in these cases.
### Comparison with a multilevel model {-}
For reference, a broadly equivalent (although not identical) multilevel model
would be:
```{r}
expt.mlm <- lmer(RT ~ Condition * time + (1|person),
data=expt.data.agg)
anova(expt.mlm) %>%
pander()
```
Although with a linear mixed model it would also be posible to analyse the
trial-by-trial data. Let's hypothesise, for example, that subjects in Conditions
2 and 4 experienced a 'practice effect', such that their RTs reduced over
multiple trials. If we plot the data, we can see this suspicion may be supported
(how conveninent!):
```{r}
ggplot(expt.data,
aes(trial, RT)) +
geom_smooth() +
facet_wrap(~paste("Condition", Condition))
```
If we wanted to replicate the aggregated RM Anova models shown above we could
write:
```{r}
options(contrasts = c("contr.sum", "contr.poly"))
expt.mlm2 <- lmer(RT ~ Condition * time + (time|person), data=expt.data)
anova(expt.mlm2)
```
But we can now add a continuous predictor for `trial`:
```{r}
expt.mlm.bytrial <- lmer(RT ~ Condition * time * trial +
(time|person),
data=expt.data)
anova(expt.mlm.bytrial)
```
The significant `Condition:trial` term indicates that there was a difference in
the practice effects between the experimental conditions.
#### {- .apa-example}
We found a significant interaction between condition and the linear term for
trial number, _F_(3, 2340.18) = 10.83, _p_ < .001. We explored this effect by
plotting model-estimated reaction times for each group for trials 1 through 25
(see Figure X): participants in condition 2 and 4 exprienced a greater reduction
in RTs across trial, suggesting a larger practice effect for these conditions.
```{r, echo=F, warning=F}
nude = expand.grid(Condition = factor(1:4),
trial = 1:25,
time=factor(1:2),
person=Inf) %>%
as.data.frame()
preds <- bind_cols(nude,
merTools::predictInterval(expt.mlm.bytrial, newdata=nude, include.resid.var = F))
```
```{r, echo=F, warning=F}
preds %>%
ggplot(aes(trial, fit, color=Condition)) +
geom_smooth(se=F) +
geom_smooth(se=F, aes(y=upr), size=.5, linetype="dashed") +
geom_smooth(se=F, aes(y=lwr), size=.5, linetype="dashed") +
facet_wrap(~paste("Time", time)) +
ylab('Predicted RT') + xlab('Trial')
```
#### {-}
See the [multilevel models section](#multilevel-models) for more details,
including analyses which allow the effects of interventions to vary between
participants (i.e., relaxing the assumption that an intervention will be equally
effective for all participants).
#### RM Anova v.s. multilevel models {- .explainer}
- The RM Anova is perhaps more familiar, and may be conventional in your field
which can make peer review easier (although in other fields mixed models are
now expected where the design warrants it).
- RM Anova requires complete data: any participant with any missing data will
be dropped from the analysis. This is problematic where data are expensive
to collect, and where data re unlikely to be missing at random, for example
in a clinical trial. In these cases RM Anova may be less efficient and more
biased than an equivalent multilevel model.
- There is no simple way of calculating effect size measures like eta^2^ from
the `lmer` model. This may or may not be a bad thing.
@baguley2009standardized, for example, recommends reporting simple (rather
than standardised) effect size measures, and is
[easily done by making predictions from the model](#predictions-and-margins).
## Checking assumptions {-}
[The text below continues on from
[this example of factorial Anova](#howell-factorial-example).]{.tip}
If we want to check that the assumptions of our Anova models are met, these
tables and plots would be a reasonable place to start. First running Levene's
test:
```{r}
car::leveneTest(eysenck.model) %>%
pander()
```
Then a QQ-plot of the model residuals to assess normality:
```{r, fig.cap="QQ plot to assess normality of model residuals"}
car::qqPlot(eysenck.model)
```
And finally a residual-vs-fitted plot:
```{r, fig.cap="Residual vs fitted (spread vs. level) plot to check homogeneity of variance."}
data_frame(
fitted = predict(eysenck.model),
residual = residuals(eysenck.model)) %>%
# and then plot points and a smoothed line
ggplot(aes(fitted, residual)) +
geom_point() +
geom_smooth(se=F)
```
For more on assumptions checks after linear models or Anova see:
<http://www.statmethods.net/stats/anovaAssumptions.html>
## Followup tests {-}
[The text below continues on from
[this example of factorial Anova](#howell-factorial-example).]{.tip}
If we want to look at post-hoc pairwise tests we can use the the `emmeans()`
function from the `emmeans::` package. By default Tukey correction is applied
for multiple comparisons, which is a reasonable default:
```{r}
em <- emmeans::emmeans(eysenck.model, pairwise~Age:Condition)
em$contrasts
```
Both cell means and pairwise contrasts are shown here. There is much more detail
on computing pairwise comparisons and other types of contrasts in the section on
[multiple comparisons](#multiple-comparisons), including ways to
[extract](#extract-contrasts) and present your comparisons in APA format.