power-analysis.Rmd

---
title: 'Power analysis'
---

```{r, include=FALSE}
library(tidyverse)
library(pander)
library(simr)
```

# Power analysis {#power-analysis}

### For most inferential statistics {-}

If you want to do power analysis for a standard statistical test, e.g. t-tests,
chi^2^ or Anova, the `pwr::` package is what you need.
[This guide has a good walkthrough](http://www.statmethods.net/stats/power.html).

### For multilevel or generalised linear models {-}

If you'd like to run power analyses for linear mixed models
([multilevel models](#multilevel-models)) then you need
[the `simr::` package](http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12504/full).
It has some neat features for calculating power by simulating data and results
from a model you specify.

To take a simple example, let's fabricate some data where outcomes `y` are
nested within groups `g`, and where there is a covariate of interest `x`. That
is, we are interested in estimating our power to detect an effect of `x`.

Note though that in this simulated data, we create outcomes (`y`) with no
relationship to `x`, and no clustering by `g` at this stage (i.e. `y`, `x` and
`g` are uncorrelated).

```{r}
set.seed(1234)
simulated.df <- data_frame(
  x = rep(0:1, 50),
  g = rep(1:10, 10),
  y = rnorm(100)
)
GGally::ggpairs(simulated.df)
```

To use `simr::` we first run our 'model of interest'. In this case it's a
random-intercepts [multilevel model](#multilevel-models). We can verify `x` is
unrelated to outcome with `lmerTest::anova`:

```{r}
library(lmerTest)
model.of.interest <- lmer(y ~ x + (1|g), data=simulated.df)
anova(model.of.interest)
```

The next step is to modify our saved model of interest. We tweak the fitted
parameters to represent our predicted values for effect sizes, variances, and
covariances.

First, let's change the 'true' effect of `x` to be 0.2:

```{r}
fixef(model.of.interest)['x'] <- .2
```

We now use the `powerSim` function to use our tweaked model to:

-   create a dataset using the parameters of our model (i.e. make random draws
    of `y` which relate to `g` and `x` as specified in the model summary).
-   re-run `lmer` on this simulated data.
-   repeat this hundreds or thousands of times
-   count how many times (i.e., for what proportion) we get a significant _p_
    value

```{r include=F, echo=T}
power.sim <- powerSim(model.of.interest, nsim=100)
```

```{r}
power.sim
```

Our observed power (proportion of times we get a significant _p_ value) is very
low here, so we might want increase our hypothesised effect of `x`, for example
to see what power we have to detect an effect of x = .8:

```{r}
fixef(model.of.interest)['x'] <- .8
```

```{r include=F, echo=T}
power.sim <- powerSim(model.of.interest, nsim=100)
```

```{r}
power.sim
```

We might also want to set one of the variance parameters of our model to
represent clustering within-`g`. First we can use `VarCorr()` to check the
variance parameters of the model we just ran:

```{r}
VarCorr(model.of.interest)
```

And we could simulate increasing the variance parameter for `g` to 0.5:

```{r}
VarCorr(model.of.interest)['g'] <- .5
```

```{r include=F, echo=T}
power.sim <- powerSim(model.of.interest, nsim=100)
```

```{r}
power.sim
```

Because the amount of clustering in our data has increased our statistical power
has gone down. This is because, when clustering is present, each new observation
(row) in the dataset provides less new _information_ to estimate our treatment
effect. Note that in this example we increased the variance associated with `g`
by quite a lot: setting the variance of `g` to 0.5 equates to an ICC for `g` of
.33 (because .5 / (.5 + 1) = .33; see the section on [calculating ICCs and
VPCs]{#icc-and-vpc})

For more details of `simr` see:
http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12504/full

[Note that for real applications you would want to set `nsim` to something
reasonably large. Certainly at least 1000 simulations, and perhaps up to
~5000.]{.admonition}