-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy patht-tests.Rmd
141 lines (103 loc) · 3.77 KB
/
t-tests.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
title: 't-tests'
---
```{r, include=F}
library(tidyverse)
library(pander)
panderOptions('digits', 2)
panderOptions('round', 3)
panderOptions('keep.trailing.zeros', TRUE)
```
## t-tests {- #t-tests}
### Visualising your data first {-}
Before you run any tests it's worth plotting your data.
Assuming you have a continuous outcome and categorical (binary) predictor (here
we use a subset of the built in `chickwts` data), a boxplot can work well:
```{r boxplot, fig.cap="The box in a boxplot indictes the IQR; the whisker indicates the min/max values or 1.5 \times the IQR, whichever is the smaller. If there are outliers beyond 1.5 \times the IQR then they are shown as points."}
chicks.eating.beans <- chickwts %>%
filter(feed %in% c("horsebean", "soybean"))
chicks.eating.beans %>%
ggplot(aes(feed, weight)) +
geom_boxplot()
```
Or a violin or bottle plot, which shows the distributions within each group and
makes it relatively easy to check some of the main assumptions of the test:
```{r}
chicks.eating.beans %>%
ggplot(aes(feed, weight)) +
geom_violin()
```
Layering boxes and bottles can work well too because it combines information
about the distribution with key statistics like the median and IQR, and also
because it scales reasonably well to multiple categories:
```{r}
chickwts %>%
ggplot(aes(feed, weight)) +
geom_violin() +
geom_boxplot(width=.1)
```
<!-- Bottleplots are just density plots, turned 90 degrees. Density plots might be more familiar to some, but it's hard to show more than 2 or 3 categories:
```{r}
chicks.eating.beans %>%
ggplot(aes(weight, fill=feed)) +
geom_density(alpha=.5)
```
And density plots are just smoothed histograms (which you might prefer if you're a fan of 80's computer games):
```{r}
chicks.eating.beans %>%
ggplot(aes(weight)) +
geom_histogram(bins=7) +
facet_grid(feed ~ .)
```
-->
### Running a t-test {-}
Assuming you really do still want to run a null hypothesis test on one or two
means, the `t.test()` function performs most common variants, illustrated below.
##### 2 independent groups {-}
Assuming your data are in long format:
```{r}
t.test(weight ~ feed, data=chicks.eating.beans)
```
Or equivalently, if your [data are untidy](#tidying-data) and each group has
it's own column (e.g. chicks eating soybeans in one column and those eating
horsebeans in another):
```{r, include=F}
untidy.chicks <- chicks.eating.beans %>%
mutate(chick = row_number()) %>%
reshape2::dcast(chick~feed, value.var = 'weight')
```
```{r}
with(untidy.chicks, t.test(horsebean, soybean))
```
##### Equal or unequal variances? {- #equal-variances .admonition}
By default R assumes your groups have unequal variances and applies an
appropriate correction (you will notice the output labelled 'Welch Two Sample
t-test').
You can turn this correction off (for example, if you're trying to replcate an
analysis done using the default settings in SPSS) but you probably do want to
assume unequal variances [see @ruxton2006unequal].
##### Paired samples {-}
If you have repeated measures on a sample you need a paired samples test.
```{r}
# simulate paired samples in pre-post design
set.seed(1234)
baseline <- rnorm(50, 2.5, 1)
followup = baseline + rnorm(50, .5, 1)
# run paired samples test
t.test(baseline, followup, paired=TRUE)
```
Note that we could also ['melt' the data into long format](#wide-to-long) and
use the `paired=TRUE` argument with a formula:
```{r}
long.form.data <- data_frame(baseline=baseline, follow=followup) %>%
reshape2::melt()
with(long.form.data, t.test(value~variable, paired=TRUE))
```
##### One-sample test {-}
Sometimes you might want to compare a sample mean with a specific value:
```{r}
# test if mean of `outcome` variable is different from 2
set.seed(1234)
test.scores <- rnorm(50, 2.5, 1)
t.test(test.scores, mu=2)
```