-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathmaking-table-1.Rmd
187 lines (153 loc) · 6.38 KB
/
making-table-1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
title: 'Making "Table 1"'
---
```{r, include=FALSE, message=F}
library(tidyverse)
library(reshape2)
library(broom)
library(pander)
```
## "Table 1" {- #table1}
Table 1 in reports of clinical trials and many psychological studies reports
characteristics of the sample. Typically, you will want to present information
collected at baseline, split by experimental groups, including:
- Means, standard deviations or other descriptive statistics for continuous
variables
- Frequencies of particular responses for categorical variables
- Some kind of inferential test for a zero-difference between the groups; this
could be a t-test, an F-statistic where there are more than 2 groups, or a
chi-squared test for categorical variables.
<!-- Make reference to this? https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3379950/ -->
Producing this table is a pain because it requires collating multiple
statistics, calculated from different functions. Many researchers resort to
performing all the analyses required for each part of the table, and then
copying-and-pasting results into Word.
It can be automated though! This example combines and extends many of the
techniques we have learned using the split-apply-combine method.
To begin, let's simulate some data from a fairly standard 2-arm clinical trial
or psychological experiment:
```{r, include=F}
# make up some example data
boring.study <- expand.grid(person=1:70, time=1:2, condition=c("Control", "Intervention")) %>%
as_tibble %>%
mutate(person=row_number(),
yob=round(1979+rnorm(n(), 0,5)),
WM=round(100+rnorm(n(), 0, 10)),
education = sample(c("Primary", "Secondary", "Graduate", "Postgraduate", NA), n(), replace=T),
ethnicity = sample(c("White British", "Mixed / multiple ethnic groups", "Asian / Asian British", "Black / African / Caribbean / Black British"), n(), replace=T),
Attitude = round(5+2 * (condition=="Control") + rnorm(n(), 0, 3)))
```
Check our data:
```{r}
boring.study %>% glimpse
```
Start by making a long-form table for the categorical variables:
```{r}
boring.study.categorical.melted <-
table1.categorical.Ns <- boring.study %>%
select(condition, education, ethnicity) %>%
melt(id.var='condition')
```
Then calculate the N's for each response/variable in each group:
```{r}
(table1.categorical.Ns <-
boring.study.categorical.melted %>%
group_by(condition, variable, value) %>%
summarise(N=n()) %>%
dcast(variable+value~condition, value.var="N"))
```
Then make a second table containing Chi2 test statistics for each variable:
```{r}
(table1.categorical.tests <-
boring.study.categorical.melted %>%
group_by(variable) %>%
do(., chisq.test(.$value, .$condition) %>% tidy) %>%
# this purely to facilitate matching rows up below
mutate(firstrowforvar=T))
```
Combine these together:
```{r}
(table1.categorical.both <- table1.categorical.Ns %>%
group_by(variable) %>%
# we join on firstrowforvar to make sure we don't duplicate the tests
mutate(firstrowforvar=row_number()==1) %>%
left_join(., table1.categorical.tests, by=c("variable", "firstrowforvar")) %>%
# this is gross, but we don't want to repeat the variable names in our table
ungroup() %>%
mutate(variable = ifelse(firstrowforvar==T, as.character(variable), NA)) %>%
select(variable, value, Control, Intervention, statistic, parameter, p.value))
```
Now we deal with the continuous variables. First we make a 'long' version of the
continuous data
```{r}
continuous_variables <- c("yob", "WM")
boring.continuous.melted <-
boring.study %>%
select(condition, continuous_variables) %>%
melt() %>%
group_by(variable)
boring.continuous.melted %>% head
```
Then calculate separate tables of t-tests and means/SD's:
```{r}
(table.continuous_variables.tests <-
boring.continuous.melted %>%
# note that we pass the result of t-test to tidy, which returns a dataframe
do(., t.test(.$value~.$condition) %>% tidy) %>%
select(variable, statistic, parameter, p.value))
(table.continuous_variables.descriptives <-
boring.continuous.melted %>%
group_by(variable, condition) %>%
# this is not needed here because we have no missing values, but if there
# were missing value in this dataset then mean/sd functions would fail below,
# so best to remove rows without a response:
filter(!is.na(value)) %>%
# note, we might also want the median/IQR
summarise(Mean=mean(value), SD=sd(value)) %>%
group_by(variable, condition) %>%
# we format the mean and SD into a single column using sprintf.
# we don't have to do this, but it makes reshaping simpler and we probably want
# to round the numbers at some point, and so may as well do this now.
transmute(MSD = sprintf("%.2f (%.2f)", Mean, SD)) %>%
dcast(variable~condition))
```
And combine them:
```{r}
(table.continuous_variables.both <-
left_join(table.continuous_variables.descriptives,
table.continuous_variables.tests))
```
Finally put the whole thing together:
```{r}
(table1 <- table1.categorical.both %>%
# make these variables into character format to be consistent with
# the Mean (SD) column for continuus variables
mutate_each(funs(format), Control, Intervention) %>%
# note the '.' as the first argument, which is the input from the pipe
bind_rows(.,
table.continuous_variables.both) %>%
# prettify a few things
rename(df = parameter,
p=p.value,
`Control N/Mean (SD)`= Control,
Variable=variable,
Response=value,
`t/χ2` = statistic))
```
And we can print to markdown format for outputting. This is best done in a
separate chunk to avoid warnings/messages appearing in the final document.
```{r}
table1 %>%
# split.tables argument needed to avoid the table wrapping
pander(split.tables=Inf,
missing="-",
justify=c("left", "left", rep("center", 5)),
caption='Table presenting baseline differences between conditions. Categorical variables tested with Pearson χ2, continuous variables with two-sample t-test.')
```
Some exercises to work on/extensions to this code you might need:
- Add a new continuous variable to the simulated dataset and include it in the
final table
- Create a third experimental group and amend the code to i) include 3 columns
for the N/Mean and ii) report the F-test from a one-way Anova as the test
statistic.
- Add the within-group percentage for each response to a categorical variable.