-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathinteractions.Rmd
276 lines (216 loc) · 9.94 KB
/
interactions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
---
title: 'Understanding interactions'
bibliography: bibliography.bib
---
```{r, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache=T)
library(tidyverse)
library(tufte)
library(ggthemes)
theme_set(theme_tufte(base_size = 18))
# +theme(plot.background = element_rect(fill='#fffff8', color='#fffff8'))
```
# Unpicking interactions {#understanding-interactions}
Objectives of this section:
- Clarify/recap what an interaction is
- Appreciate the importance of visualising interactions
- Compare different methods of plotting interactions in raw data
- Visualise interactions based on statistical model predictions
- Deal with cases where predictors are both categorical and continuous (or a
mix)
### What is an interaction? {-}
For an interaction to occur we must measure:
- An _outcome_: severity of injury in a car crash, for example.
- At least 2 _predictors_ of that outcome: e.g. age and gender.
Let's think of a scenario where we've measured severity of injury after road
accidents, along with the age and gender of the drivers involved. Let's
assume^[This example is loosely based on figures reported by
@kockelman2002driver]:
- Women are likely to be more seriously injured than men in a crash (a +10
point increase in severity)
- Drivers over 60 are more likely to injured than younger drivers (+10 point
severity vs <60 years)
For an interaction to occur we have to show that, for example:
- If you ware old and also female then you will be more severely injured
- This increase in severity of injury is more than we would expect simply by
adding the effects for being female (+10 points) and for being over 60 (+10
points). That is, if an interaction occurs the risk of being older and
female is > a 20 point increase in severity.
[Think of some other example of interactions from your own work.]{.exercise}
[Interactions capture the idea that the *effect* of one predictor changes
depending on the value of another predictor.]{.admonition}
### Visualising interactions from raw data {-}
In the previous section we established that interactions capture the idea that
the _effect_ of one predictor changes depending on the value of another
predictor.
We can see this illustrated in the traditional bar plot below. In the left panel
we see a dummy dataset in which there is no interaction; in the right panel are
data which do show evidence of an interaction:
```{r, echo=F}
inter.df <- expand.grid(female=0:1, older=0:1, interaction=0:1) %>%
as_data_frame() %>%
mutate(severity.of.injury = 50 + 10 * female + 10* older + 20 * female*older*interaction) %>%
mutate(female=factor(female, labels=c("Male", "Female"))) %>%
mutate(older=factor(older, labels=c("Young", "Old"))) %>%
mutate(interaction=factor(interaction, labels=c("No Interaction", "Interaction")))
save(inter.df, file='data/injury.Rdata')
```
```{r, fig.cap="Bar plot of injury severity by age and gender.", echo=F}
inter.df %>%
ggplot(aes(older, severity.of.injury, fill=female)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~interaction) +
scale_fill_discrete("") +
ylab("Injury severity") + xlab("")
```
However this bar plot might be better if it were re-drawn as a point and line
plot:
```{r, fig.cap="Point and line plot of injury severity by age and gender.", echo=F}
inter.df %>%
ggplot(aes(older, severity.of.injury, group=female, color=female)) +
geom_point() +
geom_line() +
facet_wrap(~interaction) +
scale_color_discrete(name="") +
ylab("Injury severity") + xlab("")
```
The reason the point and line plot improves on the bars for a number of reasons:
- Readers tend to misinterpret bar plots by assuming that values 'above' the
bar are less likely than values contained 'within' the bar, when this is not
the case [@newman2012bar].
- The main effects are easy to distinguish in the line plot: just ask yourself
if the lines are horizontal or not, and whether they are separated
vertically. In contrast, reading the interaction from the bar graph requires
that we average pairs of bars (sometimes not adjacent to one another) and
compare them - a much more difficult mental operation.
- The interaction is easy to spot: Ask yourself if the lines are parallel. If
they _are_ parallel then the _difference_ between men and women is constant
for individuals of different ages.
### A painful example {- #pain-music-data}
Before setting out to _test_ for an interaction using some kind of statistical
model, it's always a good idea to first visualise the relationships between
outcomes and predictors.
A student dissertation project investigated the analgesic quality of music
during an experimental pain stimulus. Music was selected to be either _liked_
(or disliked) by participants and was either _familiar_ or unfamiliar to them.
Pain was rated without music (`no.music`) and with music (`with.music`) using a
10cm visual analog scale anchored with the labels "no pain" and "worst pain
ever".
```{r, include=F, eval=F}
painmusic <- readxl::read_excel("~/Dropbox/Projects/music-pain/data/CompleteDataForAnalysis2.xlsx") %>%
mutate(
liked=factor(liked, labels=c("Disliked", "Liked")),
familiar=factor(familiar, labels=c("Unfamiliar", "Familiar"))
) %>% select(matches(".")) %>%
select(liked, familiar, contains("intensity.vas")) %>%
rename(no.music=intensity.vas.baseline, with.music=intensity.vas.experimental)
saveRDS(painmusic, file='data/painmusic.RDS')
```
```{r}
painmusic <- readRDS('data/painmusic.RDS')
painmusic %>% glimpse
```
Before running inferential tests, it would be helpful to see if the data are
congruent with the study prediction that _liked_ and _familiar_ music would be
more effective at reducing pain than disliked or unfamiliar music
We can do this in many different ways. The most common (but not the best) choice
would be a simple bar plot, which we can create using the `stat_summary()`
function from `ggplot2`.
```{r, message=F}
painmusic %>%
mutate(change.in.pain = with.music - no.music) %>%
ggplot(aes(x = familiar, y=change.in.pain)) +
facet_wrap(~liked) +
stat_summary(geom="bar") + xlab("")
```
This gives a pretty clear indication that something is going on, but we have no
idea about the _distribution_ of the underlying data, and so how much confidence
we should place in the finding. We are also hiding distributional information
that could be useful to check that assumptions of models we run later are also
met (for example of equal variances between groups).
If we want to preserve more information about the underlying distribution we can
use density plots, boxplots, or pointrange plots, among others.
Here we use a grouped density plot. The `interaction()` function is used to
automatically create a variable with the 4 possible groupings we can make when
combining the`liked` and `familiar` variables:
```{r, message=F, fig.fullwidth=T}
painmusic %>%
mutate(change.in.pain = with.music - no.music) %>%
ggplot(aes(x = change.in.pain,
color = interaction(familiar:liked))) +
geom_density() +
scale_color_discrete(name="")
```
And here we use a boxplot to achieve similar ends:
```{r, message=F}
painmusic %>%
mutate(change.in.pain = with.music - no.music) %>%
ggplot(aes(x = interaction(familiar:liked), y = change.in.pain)) +
geom_boxplot() +
geom_hline(yintercept = 0, linetype="dotted") +
xlab("")
```
The advantage of these last two plots is that they preserve quite a bit of
infrmation about the variable of interest. However, they don't make it easy to
read the main effects and interaction as we saw for the point-line plot above.
We can combine some benefits of both plots by adding an error bar to the
point-line plot:
```{r, message=F}
painmusic %>%
ggplot(aes(liked, with.music - no.music,
group=familiar, color=familiar)) +
stat_summary(geom="pointrange", fun.data=mean_se) +
stat_summary(geom="line", fun.data=mean_se) +
ylab("Pain (VAS) with.music - no.music") +
scale_color_discrete(name="") +
xlab("")
```
This plot doesn't include all of the information about the distribution of
effects that the density or boxplots do (for example, we can't see any asymmetry
in the distributions any more), but we still get some information about the
variability of the effect of the experimental conditions on pain by plotting the
SE of the mean over the top of each point^[We could equally well plot the 95%
confidence interval for the mean, or the interquartile range)]
At this point, especially if your current data include only categorical
predictors, you might want to move on to the section on
[making predictions from models](predictions-and-margins.html) and visualising
these.
### Continuous predictors {-}
The `modelr` package contains useful functions which enable you to make
predictions from models, and visualise them easily.
In this example we run two models, with and without a polynomial effect for
`hp`. The predictions from both models are then plotted against one another.
```{r}
library(modelr)
m1 <- lm(mpg~hp, data = mtcars)
m2 <- lm(mpg ~ poly(hp, 2), data = mtcars)
mtcars %>% gather_predictions(m1, m2) %>%
ggplot(aes(hp, pred, color=model)) +
geom_point() +
geom_smooth()
```
We could also plot this over the top of the original data to give an example of
how the models fit the data.
```{r}
mtcars %>% gather_predictions(m1, m2) %>%
ggplot(aes(hp, pred, color=model)) +
geom_smooth() +
geom_point(aes(y=mpg), color="grey")
```
The `gather_predictions` function can also be used to plot interactions.
```{r}
m3 <- lm(mpg~wt*hp, data=mtcars)
summary(m3)
```
By making a new grid of data, using `expand.grid()`, at values of interest to
us, we can plot the interaction and see that the effect of `wt` is diminished as
`hp` increases.
```{r}
grid <- expand.grid(wt = quantile(mtcars$wt, probs=c(.25,.5,.75)),
hp = quantile(mtcars$hp, probs=c(.1, .25,.5,.75, .9)))
grid %>%
gather_predictions(m3) %>%
ggplot(aes(hp, pred, color=factor(wt))) +
geom_smooth(method="lm") +
ylab("Predicted mpg")
```