-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbulk.Rmd
113 lines (83 loc) · 4.68 KB
/
bulk.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
title: "Bulk RNA-seq Analysis"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
```
<br>
## Overview
The __goal__ of this analysis is to identify differentially expressed genes (DEG) between responders and non-responders in the HVP01 study using the bulk RNA-seq data.
From whole blood transcriptome profiling, we have bulk RNA-seq data for 15 subjects over 5 time points (visit day 0,1,3,7,14). The studied Hepatitis B Vaccine (drug: ENGERIX®-B) consists of three doses. After each dose, response to the vaccination is determined by the anti-HBs antibody level above 10 mIU/mL.
In this report, we focus on the __response after dose 2__ (with 13 responders and two non-responders). At a single time point, we perform a naive t-test. To include multiple time points, we also perform the parametric [PBtest](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2783-8), which takes into account the repeated measurement data structure.
This report is organized in the following sections.
- [Sample table](#ctab): vaccine response and other meta data for each sample.
- [Single time point analysis](#ttest): t-test for response after dose 2 with Day 7 data only.
- [Multiple time points analysis](#PBtest): parametric PBtest for response after dose 2 with all time points.
Load useful packages.
```{r package, message=FALSE, warning=FALSE, catch=FALSE}
library(genefilter)
library(PBtest)
```
In the following analyses, significance level is set at 5%. Benjamini-Hochberg multiple testing correction will be applied.
```{r alpha}
alpha <- 0.05
```
<br>
## Sample table {#ctab}
Earlier, we read in "HVP_sampleTable_response_May18.csv" to `ctab`, which contains the dose response and other clinical information for each sample in the expression matrix.
```{r ctab}
kable(ctab) %>%
kable_styling(bootstrap_options = c("hover")) %>%
scroll_box(width = "100%", height = "300px")
```
<br>
## Single time point analysis {#ttest}
In vaccine studies, it is common to look at the Day 7 post-vacctine data given the time lag between vaccination and immune activation. At the single time point, we can simply use the t-test to compare between the responders and non-responders.
#### T-test for response after dose 2 with Day 7 data
```{r ttest}
## subset Day 7 column data
ctab.D7 <- ctab %>% filter(Visit_day==7)
## t-test
out.t.D7 <- rowttests(dat.filt10.symbol[,ctab.D7$Sample_name], as.factor(ctab.D7$Responder_dose2))
padj.t.D7 <- p.adjust(out.t.D7$p.value, method="BH")
names(padj.t.D7) <- rownames(out.t.D7)
(sig.gene.D7 <- names(which(padj.t.D7<alpha))) #1 DEG
head(sort(padj.t.D7)) #top genes
```
There is only `r length(sig.gene.D7)` DEG identified using sinlge time point at Day 7. The following plot shows that this gene is under-regulated in the responders.
```{r ttest-plot, out.width=c('50%', '50%')}
## plot
plot(dat.filt10.symbol[sig.gene.D7,ctab.D7$Sample_name],
col=as.factor(ctab.D7$Responder_dose2), pch=19,
xlab="Subject", ylab="Day 7 expression level",
main=paste0(sig.gene.D7, ", p-value = ", round(padj.t.D7[sig.gene.D7],3)))
legend("topleft", c("Non-responder","Responder"), col=1:2, pch=19)
```
<br>
## Multiple time points analysis {#PBtest}
Immune response is a dynamic procedure. Using just one time point limits the scope of our analysis. With PBtest, a highly efficient hypothesis testing method for regression-type tests with correlated observations and heterogeneous variance structure, we are able to include all time points available in the study. It reveals more DEGs between the responders and non-responders.
#### PBtest for response after does 2 with all time points
```{r PBtest}
## PBtest
out.PBtest <- PBtest(YY=t(dat.filt10.symbol),
xx=as.factor(ctab$Responder_dose2),
id=ctab$Subject_id)
padj.PBtest <- p.adjust(out.PBtest$p.value, method="BH")
names(padj.PBtest) <- names(out.PBtest$p.value)
padj.PBtest.sorted <- sort(padj.PBtest)
(sig.gene.PBtest <- names(padj.PBtest.sorted)[padj.PBtest.sorted<alpha]) #14 DEGs
padj.PBtest.sorted[1:30] #top genes
```
Using all time points, `r length(sig.gene.PBtest)` DEGs are identified using PBtest. The following plots show the expression levels of those DEGs across time in responders and non-responders.
```{r PBtest-plot, out.width=c('50%', '50%')}
## plot
for(gene in sig.gene.PBtest){
x <- dat.filt10.symbol[gene,]
df.x <- ctab %>% mutate(Expression=x)
g <- ggplot(data=df.x, aes(x=Visit_day, y=Expression, group=Subject_id, colour=Responder_dose2)) +
geom_line(aes(linetype=Responder_dose2)) + geom_point() +
scale_linetype_manual(values=c("dashed", "solid")) +
ggtitle(paste0(gene, ", p-value = ", round(padj.PBtest[gene],3)))
plot(g)
}
```