anova.Rmd

---
title: 'Anova'
---

# Anova

```{r, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse=TRUE, cache=TRUE, message=F)
library(tidyverse)
library(pander)
library(ez)
library(lmerTest)
theme_set(theme_minimal())
```

[Be sure to read the [section on linear models in R](#linear-models-simple)
_before_ you read this section, and specifically the parts on
[specifying models with formulae](#formulae).]{.admonition}

This section attempts to cover in a high level way how to specify anova models
in R and some of the issues in interpreting the model output. If you need to
revise the basic idea of an Anova, the Howell textbook [@howell2016fundamental].
For a very quick reminder,
[this interactive/animated explanation of Anova is helpful](http://web.utah.edu/stat/introstats/anovaflash.html).

If you just want the 'answers' — i.e. the syntax to specify common Anova models
-- you could skip to the next section: [Anova cookbook](#anova-cookbook)

There are 4 rules for doing Anova in R and not wanting to cry:

1. Keep your data in 'long' format.
2. Know the
   [differences between character, factor and numeric](#factors-and-numerics)
   variables
3. Do not use the `aov()` or `anova()` functions to get an Anova table unless
   you know what you are doing.
4. Learn about the types of sums of squares and always remember to specify
   `type=3`, unless you know better.

### Rules for using Anova in R {-}

#### Rule 1: Use long format data {-}

In R, data are almost always most useful a long format where:

-   each row of the dataframe corresponds to a single measurement occasion
-   each column corresponds to a variable which is measured

```{r include=F}
df <- expand.grid(time=1:3, person=1:10) %>%
  as_data_frame() %>%
  mutate(
    outcome = rpois(n(), 10)
  ) %>%
  group_by(person) %>%
  mutate(predictor = rpois(1, 3)) %>%
  arrange(person, time) %>%
  select(person, time, predictor, outcome)


df.wide <- df %>%
  mutate(time = paste("Time", time)) %>%
  group_by(person, predictor) %>%
  spread(time, outcome)
df.wide
```

For example, in R we will have data like this:

```{r}
df %>%
  head %>%
  pander
```

Whereas in SPSS we might have the same data structured like this:

```{r}
df.wide %>%
    head %>%
    pander
```

R always uses long form data when running an Anova, but one downside is that it
therefore has no automatic to know which rows belong to which person (assuming
individual people are the unit of error in your model). This means that for
repeated measures designs you need to make explicit which measures are repeated
when specifying the model (see the section on repeated designs below).

#### Rule 2: Know your variables {-}

See [the section on dataframes](#datasets-dataframes) and
[on the different column types](#factors-and-numerics) and be sure you can
distinguish:

-   Numeric variables
-   Factors
-   Character strings.

In Anova:

-   Outcomes will be numeric variables
-   Predictors will be factors or (preferably) character strings

If you want to run Ancova models, you can also add numeric predictors.

#### Rule 3: Don't use `aov()` or `anova()` {-}

This is the most important rule of all.

The `aov` and `anova` functions have been around in R a long time. For various
historical reasons the defaults for these functions won't do what you expect if
you are used to SPSS, Stata, SAS, and most other stats packages. These
differences are important and will be confusing and give you misleading results
unless you understand them.

The recommendation here is:

-   If you have a factorial experiment define your model using `lm()` and then
    use `car::Anova()` to calculate F tests.

-   If you have repeated measures, your data are perfectly balanced, and you
    have no missing values then [use `afex::car_aov()`](#repeated-measures).

-   If you think you want a repeated measures Anova but your data are not
    balanced, or you have missing data, use
    [linear mixed models](#multilevel-models) instead via the `lme4::` package.

#### Rule 4: Use type 3 sums of squares (and learn why) {- #sums-squares}

You may be aware, but there are at least 3 different ways of calculating the
sums of squares for each factor and interaction in an Anova. In short,

-   SPSS and most other packages use type 3 sums of squares.
-   `aov` and `anova` use type 1.
-   By default, `car::Anova` and `ez::ezANOVA` use type 2, but can use type 3 if
    you ask.

This means you must:

-   Make sure you use type 3 sums of squares unless you have a reason not to.
-   Always pass `type=3` as an argument when running an Anova.

##### {- .explainer}

A longer explanation of _why_ you probably want type 3 sums of squares is given
in this
[online discussion on stats.stackechange.com](https://stats.stackexchange.com/questions/60362/choice-between-type-i-type-ii-or-type-iii-anova)
and practical implications are shown in
[this worked example](http://dwoll.de/rexrepos/posts/anovaSStypes.html).

An even longer answer, including a much deeper exploration of the philosophical
questions involved is given by @venables1998exegeses.

### Recommendations for doing Anova {- #anova-recommendations}

1. Make sure to [Plot your raw data _first_](#graphics)

1. Where you have interactions,
   [be cautious in interpreting the main effects in your model, and always plot the model predictions](#understanding-interactions).

1. If you find yourself aggregating (averaging) data before running your model,
   [think about using a mixed or multilevel model](#multilevel-models) instead.

1. If you are using repeated measures Anova,
   [check if you should should be using a mixed model](#multilevel-models)
   instead. If you have an unbalanced design or any missing data, you probably
   should use a mixed model.

## Anova 'Cookbook' {- #anova-cookbook}

This section is intended as a shortcut to running Anova for a variety of common
types of model. If you want to understand more about what you are doing, read
the section on [principles of Anova in R first](#anova), or consult an
introductory text on Anova which covers Anova [e.g. @howell2012statistical].

### Between-subjects Anova {-}

#### Oneway Anova (> 2 groups) {- #oneway-anova}

If your design has more than 2 groups then you should use oneway Anova.

Let's say we asked people to taste 1 of 4 fruit juices, and rate how tasty it
was on a scale from 0 to 10:

```{r, echo=F}
set.seed(12345)
tasty.juice <- expand.grid(person=1:25, juice = c("Mango", "Apple", "Orange", "Durian")) %>%
  mutate(tastiness = round(rnorm(n(), 5, 2)) + -3 * as.numeric(juice=="Durian"), person=row_number()) %>%
  rowwise() %>% mutate(tastiness = min(max(tastiness, 0), 10))
saveRDS(tasty.juice, file='data/juice.RDS')
tasty.juice %>%
  ggplot(aes(juice, tastiness)) +
  geom_boxplot()
```

We can run a oneway Anova with [type 3 sums of squares](#sums-squares) using the
`Anova` function from the `car::` package:

```{r}
juice.lm <- lm(tastiness ~ juice, data=tasty.juice)
juice.anova <- car::Anova(juice.lm, type=3)
juice.anova
```

And we could [compute the contasts](#contrasts) for each fruit against the
others (the grand mean):

```{r}
juice.lsm <- emmeans::emmeans(juice.lm, pairwise~juice, adjust="fdr")
juice.contrasts <- emmeans::contrast(juice.lsm, "eff")
juice.contrasts$contrasts
```

[We found a significant main effect of juice, `r
apastats::describe.Anova(juice.anova, 2)`. Followup tests (adjusted for false
discovery rate) indicated that only Durian differed from the other juices, and
was rated a significantly less tasty Mango, Apple, and Orange
juice.]{.apa-example}

#### Factorial Anova {- #howell-factorial-example}

We are using a
[dataset from Howell](http://www.uvm.edu/~dhowell/methods7/DataFiles/Tab13-2.dat)
[@howell2012statistical, chapter 13]: an experiment which recorded `Recall`
among young v.s. older adults (`Age`) for each of 5 conditions.

```{r include=F, eval=F}
eysenck <- read.table('howell-data/Tab13-2.dat', header=T) %>%
  mutate(
    Condition=factor(Condition,
                     labels=c("Counting", "Rhyming", "Adjective", "Imagery", "Intention")),
    Age = factor(Age, labels=c("Young", "Older")))
saveRDS(eysenck, file="data/eysenck.Rdata")
```

These data would commonly be plotted something like this:

```{r}
eysenck <- readRDS("data/eysenck.Rdata")
eysenck %>%
  ggplot(aes(Condition, Recall, group=Age, color=Age)) +
  stat_summary(geom="pointrange", fun.data = mean_cl_boot) +
  ylab("Recall (95% CI)") +
  xlab("")
```

[Visual inspection of the data (see Figure X) suggested that older adults
recalled more words than younger adults, and that this difference was greatest
for the intention, imagery, and adjective conditions. Recall peformance was
worst in the counting and rhyming conditions.]{.apa-example}

Or alternatively if we wanted to provde a better summary of the distribution of
the raw data we could use a boxplot:

```{r, fig.cap="Boxplot for recall in older and young adults, by condition."}
eysenck %>%
  ggplot(aes(Age, Recall)) +
  geom_boxplot(width=.33) +
  facet_grid(~Condition) +
  ylab("Recall (95% CI)") +
  xlab("")
```

We can run a linear model including the effect of `Age` and `Condition` and the
interaction of these variables, and calculate the Anova:

```{r}
eysenck.model <- lm(Recall~Age*Condition, data=eysenck)
car::Anova(eysenck.model, type=3)
```

### Repeated measures or 'split plot' designs {- #repeated-measures}

It might be controversial to say so, but the tools to run traditional repeat
measures Anova in R are a bit of a pain to use. Although there are numerous
packages simplify the process a little, their syntax can be obtuse or confusing.
To make matters worse, various textbooks, online guides and the R help files
themselves show many ways to achieve the same ends, and it can be difficult to
follow the differences between the underlying models that are run.

At this point, given the
[many other advantages of linear mixed models over traditional repeated measures Anova](http://jamanetwork.com/journals/jamapsychiatry/article-abstract/481967),
and given that many researchers abuse traditional Anova in practice (e.g. using
it for unbalanced data, or where some data are missing), the recommendation here
is to simply give up and learn how to run linear mixed models. These can (very
closely) replicate traditional Anova approaches, but also:

-   Handle missing data or unbalanced designs gracefully and efficiently.

-   Be expanded to include multiple levels of nesting. For example, allowing
    pupils to be nested within classes, within schools. Alternatively multiple
    measurements of individual patients might be clustered by hospital or
    therapist.

-   Allow time to be treated as a continuous variable. For example, time can be
    modelled as a slope or some kind of curve, rather than a fixed set of
    observation-points. This can be more parsimonious, and more flexible when
    dealing with real-world data (e.g. from clinical trials).

It would be best at this point to
[jump straight to the main section multilevel or mixed-effects models](#multilevel-models),
but to give one brief example of mixed models in use:

#### {- #sleepstudy-rmanova}

The `sleepstudy` dataset in the `lme4` package provides reaction time data
recorded from participants over a period of 10 days, during which time they were
deprived of sleep.

```{r}
lme4::sleepstudy %>%
  head(12) %>%
  pander
```

We can plot these data to show the increase in RT as sleep deprivation
continues:

```{r}
lme4::sleepstudy %>%
  ggplot(aes(factor(Days), Reaction)) +
  geom_boxplot() +
  xlab("Days") + ylab("RT (ms)") +
  geom_label(aes(y=400, x=2, label="you start to\nfeel bad here"), color="red") +
  geom_label(aes(y=450, x=9, label="imagine how bad\nyou feel by this point"), color="red")
```

If we want to test whether there are significant differences in RTs between
`Days`, we could fit something very similar to a traditional repeat measures
Anova using the `lme4::lmer()` function, and obtain an Anova table for the model
using the special `anova()` function which is added by the lmerTest package:

```{r}
sleep.model <- lmer(Reaction ~ factor(Days) + (1 | Subject), data=lme4::sleepstudy)
anova(sleep.model)
```

### Traditional repeated measures Anova {- #trad-rm-anova}

If you really need to fit the traditional repeated measures Anova (e.g. your
supervisor/reviewer has asked you to) then you should use either the `afex::` or
`ez::` packages.

Let's say we have an experiment where we record reaction 25 times (`Trial`)
before and after (`Time` = {1, 2}) one of 4 experimental manipulations
(`Condition` = {1,2,3,4}). You have 12 participants in each condition and no
missing data:

```{r, include=F}
set.seed(1234)
goodrts <- data_frame(RT=sn::rsn(1e5, xi=200, omega = 75, alpha = 3)) %>% filter(RT>40)

expt.data <- expand.grid(Condition=factor(1:4),
                         trial = 1:25,
                         time=factor(1:2), person=1:12) %>%
  mutate(person = factor(as.numeric(factor(paste(Condition, person))))) %>%
  mutate(RT = sample(goodrts$RT, n())) %>%
  mutate(
    RT = RT + -20*(Condition==2&time==2),
    RT = RT + 20*(Condition==3&time==2),
    RT = RT + 30*(Condition==3&time==2),
    RT = RT + rnorm(n(), 0, 50) * (time==2),
    RT = RT - (10*sqrt(trial) + rnorm(n(), 0, 5)) * (Condition == 2|Condition==4)
  )
saveRDS(expt.data, file="data/expt.data.RDS")
```

```{r}
expt.data %>%
  ggplot(aes(Condition, RT)) +
  geom_boxplot() +
  facet_wrap(~paste("Time", time))
```

We want to use our repeated measurements before and after the experimental
interventions to increase the precision of our estimate of the between-condition
differences.

Our first step is to aggregate RTs for the multiple trials, taking the mean
across all trials at a particular `time`:

```{r}
expt.data.agg <- expt.data %>%
  group_by(Condition, person, time) %>%
  summarise(RT=mean(RT))

head(expt.data.agg)
```

Because our data are still in long form (we have two rows per person), we have
to explicitly tell R that `time` is a within subject factor. Using the `afex::`
package we would write:

```{r}
expt.afex <- afex::aov_car(RT ~ Condition * time + Error(person/time),
                           data=expt.data.agg)
expt.afex$anova_table %>%
  pander(caption="`afex::aov_car` output.")
```

Using the `ez::` package we would write:

```{r}
expt.ez <- ez::ezANOVA(data=expt.data.agg,
            dv = RT,
            wid = person,
            within = time,
            between = Condition)

expt.ez$ANOVA %>%
  pander(caption="`ez::ezANOVA` output.")
```

[These are the same models: any differences in the output are simply due to
rounding. You should use whichever of `ez::` and `afex::` you find easiest to
understand]{.admonition}

The `ges` column is the generalised eta squared effect-size measure, which is
preferable to the partial eta-squared reported by SPSS
[@bakeman2005recommended].

#### But what about [insert favourite R package for Anova]? {- .explainer}

Lots of people like `ez::ezANOVA` and other similar packages. My problem with
`ezANOVA` is that it doesn't use formulae to define the model and for this
reason encourages students to think of Anova as something magical and separate
from linear models and regression in general.

This guide is called 'just enough R', so I've mostly chosen to show only
`car::Anova` because I find this the most coherent method to explain. Using
formulae to specify the model reinforces a technique which is useful in many
other contexts. I've make an exception for repeated because many people find
specifying the error structure explicitly confusing and hard to get right, and
so `ez::` may be the best option in these cases.

### Comparison with a multilevel model {-}

For reference, a broadly equivalent (although not identical) multilevel model
would be:

```{r}
expt.mlm  <- lmer(RT ~ Condition * time + (1|person),
    data=expt.data.agg)

anova(expt.mlm) %>%
    pander()
```

Although with a linear mixed model it would also be posible to analyse the
trial-by-trial data. Let's hypothesise, for example, that subjects in Conditions
2 and 4 experienced a 'practice effect', such that their RTs reduced over
multiple trials. If we plot the data, we can see this suspicion may be supported
(how conveninent!):

```{r}
ggplot(expt.data,
  aes(trial, RT)) +
  geom_smooth() +
  facet_wrap(~paste("Condition", Condition))
```

If we wanted to replicate the aggregated RM Anova models shown above we could
write:

```{r}
options(contrasts = c("contr.sum", "contr.poly"))
expt.mlm2 <- lmer(RT ~ Condition * time + (time|person), data=expt.data)
anova(expt.mlm2)
```

But we can now add a continuous predictor for `trial`:

```{r}
expt.mlm.bytrial <- lmer(RT ~ Condition * time * trial +
    (time|person),
    data=expt.data)

anova(expt.mlm.bytrial)
```

The significant `Condition:trial` term indicates that there was a difference in
the practice effects between the experimental conditions.

#### {- .apa-example}

We found a significant interaction between condition and the linear term for
trial number, _F_(3, 2340.18) = 10.83, _p_ < .001. We explored this effect by
plotting model-estimated reaction times for each group for trials 1 through 25
(see Figure X): participants in condition 2 and 4 exprienced a greater reduction
in RTs across trial, suggesting a larger practice effect for these conditions.

```{r, echo=F, warning=F}
nude = expand.grid(Condition = factor(1:4),
                   trial = 1:25,
                   time=factor(1:2),
                   person=Inf) %>%
  as.data.frame()

preds <- bind_cols(nude,
          merTools::predictInterval(expt.mlm.bytrial, newdata=nude, include.resid.var = F))
```

```{r, echo=F, warning=F}
preds %>%
  ggplot(aes(trial, fit, color=Condition)) +
  geom_smooth(se=F) +
  geom_smooth(se=F, aes(y=upr), size=.5, linetype="dashed") +
  geom_smooth(se=F, aes(y=lwr), size=.5, linetype="dashed") +
  facet_wrap(~paste("Time", time)) +
  ylab('Predicted RT') + xlab('Trial')
```

#### {-}

See the [multilevel models section](#multilevel-models) for more details,
including analyses which allow the effects of interventions to vary between
participants (i.e., relaxing the assumption that an intervention will be equally
effective for all participants).

#### RM Anova v.s. multilevel models {- .explainer}

-   The RM Anova is perhaps more familiar, and may be conventional in your field
    which can make peer review easier (although in other fields mixed models are
    now expected where the design warrants it).

-   RM Anova requires complete data: any participant with any missing data will
    be dropped from the analysis. This is problematic where data are expensive
    to collect, and where data re unlikely to be missing at random, for example
    in a clinical trial. In these cases RM Anova may be less efficient and more
    biased than an equivalent multilevel model.

-   There is no simple way of calculating effect size measures like eta^2^ from
    the `lmer` model. This may or may not be a bad thing.
    @baguley2009standardized, for example, recommends reporting simple (rather
    than standardised) effect size measures, and is
    [easily done by making predictions from the model](#predictions-and-margins).

## Checking assumptions {-}

[The text below continues on from
[this example of factorial Anova](#howell-factorial-example).]{.tip}

If we want to check that the assumptions of our Anova models are met, these
tables and plots would be a reasonable place to start. First running Levene's
test:

```{r}
car::leveneTest(eysenck.model) %>%
  pander()
```

Then a QQ-plot of the model residuals to assess normality:

```{r, fig.cap="QQ plot to assess normality of model residuals"}
car::qqPlot(eysenck.model)
```

And finally a residual-vs-fitted plot:

```{r, fig.cap="Residual vs fitted (spread vs. level) plot to check homogeneity of variance."}

data_frame(
  fitted = predict(eysenck.model),
  residual = residuals(eysenck.model)) %>%
  # and then plot points and a smoothed line
  ggplot(aes(fitted, residual)) +
    geom_point() +
    geom_smooth(se=F)

```

For more on assumptions checks after linear models or Anova see:
<http://www.statmethods.net/stats/anovaAssumptions.html>

## Followup tests {-}

[The text below continues on from
[this example of factorial Anova](#howell-factorial-example).]{.tip}

If we want to look at post-hoc pairwise tests we can use the the `emmeans()`
function from the `emmeans::` package. By default Tukey correction is applied
for multiple comparisons, which is a reasonable default:

```{r}
em <- emmeans::emmeans(eysenck.model, pairwise~Age:Condition)
em$contrasts
```

Both cell means and pairwise contrasts are shown here. There is much more detail
on computing pairwise comparisons and other types of contrasts in the section on
[multiple comparisons](#multiple-comparisons), including ways to
[extract](#extract-contrasts) and present your comparisons in APA format.