-
Notifications
You must be signed in to change notification settings - Fork 41
/
Copy pathURD-03-ClusteringAndDeterminingTips.Rmd
563 lines (410 loc) · 23.8 KB
/
URD-03-ClusteringAndDeterminingTips.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
---
title: "URD 3: Determining Tips"
linestretch: 0.5
output:
pdf_document:
latex_engine: xelatex
html_notebook: default
---
\fontsize{8}{18}
```{r knit_prep, echo=F, results='hide', message=F, warning=F}
library("knitr")
opts_chunk$set(tidy.opts=list(width.cutoff=80),tidy=TRUE, dev="png", dpi=150)
```
```{r, message=F, warning=F}
library(URD)
library(scran) # Batch correction using MNN
library(gridExtra) # grid.arrange
```
```{r, include=F}
setwd("~/Dropbox/Jeff-Yiqun/URD-walkthrough/")
```
# Load previous saved object
```{r load-object}
object <- readRDS("obj/object_3_withDMandPT.rds")
```
# Trim to cells from final stage
URD requires users to define the 'tips' of the developmental tree (i.e. the terminal populations). To do this, we clustered the data from the final stage of our timecourse, and decided which clusters to use as tips.
The first step is to trim the data to the cells from the final stage.
```{r subset-object}
# Subset the object
cells.6s <- grep("ZF6S", colnames([email protected]), value=T)
object.6s <- urdSubset(object, cells.keep=cells.6s)
```
# Perform PCA / tSNE on final stage cells
Then, we loaded the variable genes specific to this stage from our earlier calculations (see Part 1), and performed PCA, graph-based clustering, and tSNE (to easily visualize the clustering).
```{r pca-tsne-clustering}
# Load the variable genes specific to this stage
var.genes.6s <- scan("var_genes/var_ZF6S.txt", what="character")
[email protected] <- var.genes.6s
# Calculate PCA
object.6s <- calcPCA(object.6s)
# Calculate tSNE
set.seed(18)
object.6s <- calcTsne(object.6s, dim.use="pca", perplexity = 30, theta=0.5)
```
We noticed that, in this context, there was a noticeable batch effect between our two samples, and it sometimes drove cluster boundaries in the data, which was not desired.
```{r}
# Look at batch information
plotDim(object.6s, "BATCH", plot.title = "BATCH (Uncorrected)")
```
# Batch correct data
Therefore, we batch-corrected the data. We used MNN (https://doi.org/10.1101/165118), a single-cell aware batch correction algorithm that finds mutually nearest neighbors between batches, and uses them to calculate correction vectors.
```{r}
# Make a copy of the object
object.6s.mnn <- object.6s
# Generate expression matrices from each batch
cells.1 <- rownames(object.6s.mnn@meta)[which(object.6s.mnn@meta$BATCH=="DS5")]
cells.2 <- rownames(object.6s.mnn@meta)[which(object.6s.mnn@meta$BATCH=="DS5b")]
exp.1 <- as.matrix([email protected][,cells.1])
exp.2 <- as.matrix([email protected][,cells.2])
# Batch correct using MNN, correcting all genes, but using the
# variable genes to determine mutual nearest neighbors.
logupx.6s.mnn <- mnnCorrect(exp.1, exp.2, [email protected], k=20, sigma=1, svd.dim=0, cos.norm.in=T, cos.norm.out=F)
# Combine the resultant corrected matrices and return to
# original order of cells
logupx.6s.mnn <- do.call("cbind", logupx.6s.mnn[[1]])
logupx.6s.mnn <- logupx.6s.mnn[,colnames([email protected])]
# Re-sparsify the matrix, by turning anything less than 0 or near 0 back
# to 0.
logupx.6s.mnn[logupx.6s.mnn < 0.05] <- 0
[email protected] <- as(logupx.6s.mnn, "dgCMatrix")
# Re-calculate PCA
object.6s.mnn <- calcPCA(object.6s.mnn)
# Re-calculate tSNE
set.seed(18)
object.6s.mnn <- calcTsne(object.6s.mnn, dim.use="pca", perplexity = 30, theta=0.5)
```
We found that this ameliorated the batch effect in the data.
```{r}
# Look at batch information
plotDim(object.6s.mnn, "BATCH", plot.title = "BATCH (MNN Corrected)")
```
# Cluster cells from final stage
### Do graph-based clustering
```{r}
# Do graph clustering with Louvain-Jaccard
object.6s.mnn <- graphClustering(object.6s.mnn, num.nn = c(5,8,10,15), method="Louvain", do.jaccard = T)
# Do graph clustering with Infomap-Jaccard
object.6s.mnn <- graphClustering(object.6s.mnn, num.nn = c(10,15,20,30,40), method="Infomap", do.jaccard = T)
```
### Plot individual clusterings
Many of these parameter choices seems fairly valid. Infomap-20 and Infomap-30 clusterings have a resolution that seems reasonable, given our expectations for the number of cell populations present in this stage, and seem to draw boundaries that agree with the most dramatic boundaries in the tSNE plot.
```{r, fig.height=5,fig.width=8.5}
clusterings <- c(paste0("Louvain-", c(5,8,10,15)), paste0("Infomap-", c(10,15,20,30,40)))
for (c in clusterings) {
plot(plotDim(object.6s.mnn, c, legend=F))
}
```
# Markers of each cluster
We calculated the top markers of each cluster in the data, and assigned cluster identities based on the known expression patterns of some of these top markers.
```{r}
clusters <- unique([email protected]$`Infomap-30`)
```
```{r, eval=F}
# Precision-recall markers to find the best 'markers' of each cluster.
pr.markers <- lapply(clusters, function(c) markersAUCPR(object.6s.mnn, clust.1 = c, clustering="Infomap-30", [email protected]))
names(pr.markers) <- clusters
```
# Assign clusters
A totally independent clustering could be attempted here, but since there is so much prior knowledge in zebrafish, we use it to evaluate and annotate our clustering.
First, we created data.frames to keep track of our cluster assignments.
```{r}
# Make a set of data.frames to keep track during cluster assignment.
I30.n <- length(unique([email protected]$`Infomap-30`))
I30.cluster.assignments <- data.frame(
cluster=1:I30.n,
name=rep(NA, I30.n),
tip=rep(NA, I30.n),
row.names=1:I30.n
)
I20.n <- length(unique([email protected]$`Infomap-20`))
I20.cluster.assignments <- data.frame(
cluster=1:I20.n,
name=rep(NA, I20.n),
tip=rep(NA, I20.n),
row.names=1:I20.n
)
```
### Endoderm
Markers of cluster 21 and 33 include the endoderm markers PRDX5, FOXA1, and FOXA2. 21 expresses pharyngeal endoderm markers NKX2.7 and IRX7, while 33 expresses the posterior marker CDX4, marking it as the pancreatic and interstinal endoderm.
```{r, fig.width=12, fig.height=4}
plotDot(object.6s.mnn, genes = c("PRDX5", "FOXA1", "FOXA2", "NKX2.7", "IRX7", "CDX4"), clustering="Infomap-30")
```
```{r}
I30.cluster.assignments["21","name"] <- "Endoderm Pharyngeal"
I30.cluster.assignments["33","name"] <- "Endoderm Pancreatic/Intestinal"
```
### Axial mesoderm
Cluster 38 is the prechordal plate, based on its expression of hatching enzymes (HE1A, HE1B, and CTSLB/HGG1). Clusters 37 and 41 are the notochord, based on their expression of the collagens (COL2A1A, COL8A1A), collagen synthesis related enzymes (P4HA1B), and classic notochord genes (TA/NTL, NOTO/FLH, and NTD5).
```{r, fig.width=12, fig.height=4}
plotDot(object.6s.mnn, genes=c("CTSLB", "HE1A", "HE1B", "P4HA1B", "COL2A1A", "COL8A1A", "TA", "NOTO", "NTD5"), clustering="Infomap-30")
```
```{r}
# 38 is Prechordal plate due to expression of hatching enzymes HE1B, HE1A, CTSLB (hgg1)
I30.cluster.assignments["38","name"] <- "Prechordal Plate"
# 38 and 41 are the notochord (NOTO, NTD5)
# 41 seems like the real tip, because its strongest markers are the differentiation genes
# (P4HA1B, COL8A1A, COL9A3, PLOD1A, COL2A1A) -- all involved in collagen synthesis
I30.cluster.assignments["37","name"] <- "Notochord Posterior" # Don't use as tip
I30.cluster.assignments["41","name"] <- "Notochord Anterior" # Use as a tip
```
### Intermediate/lateral mesoderm
Cluster 22 is the cephalic mesoderm, based on its expression of the markers FOXF2A and FSTA. Clusters 27 and 43 are both hematopoietic lineages, based on their expression of TAL1 and LMO2. Cluster 27 is likely the Intermediate Cell Mass (erythroid lineage) based on its expression of GATA1A, whereas cluster 43 is likely the Rostral Blood Island (myeloid lineage) based on its expression of SPI1B, an early macrophage marker. Cluster 40 is the pronephric progenitors, based on its expresion of FOXJ1A and PAX2A. Finally, cluster 7 is the heart primordium, based on its expression of the classic marker HAND2.
```{r, fig.width=12, fig.height=4}
plotDot(object.6s.mnn, genes=c("FSTA", "FOXF2A", "TAL1", "LMO2", "GATA1A", "MORC3B", "SPI1B", "PAX2A", "FOXJ1A", "HAND2"), clustering="Infomap-30")
```
However, the boundary of cluster 27 doesn't agree with the boundary of expression of the classic markers of this cell type (GATA1A and MORC3B, for instance). Cluster 56 from the finer resolution Infomap-20 clustering agrees with gene expression more, so we will use that for Hematopoietic (ICM).
```{r, fig.width=7, fig.height=7}
grid.arrange(grobs=list(
plotDimHighlight(object.6s.mnn, "Infomap-30", "27", legend=F),
plotDim(object.6s.mnn, "GATA1A"),
plotDim(object.6s.mnn, "MORC3B"),
plotDimHighlight(object.6s.mnn, "Infomap-20", "56", legend=F)
))
```
```{r}
# 22 is cephalic mesoderm?? FSTA, FOXF2A
I30.cluster.assignments["22","name"] <- "Cephalic Mesoderm"
# 43 is Hematopoietic (TAL1, LMO2), Rostral Blood Island (SPI1B)
I30.cluster.assignments["43","name"] <- "Hematopoeitic (RBI)"
# Infomap-20 Cluster 56 seems to be a better GATA1A+ cluster.
I20.cluster.assignments["56", "name"] <- "Hematopoeitic (ICM)"
# 40 seems to be pronephros (PAX2A + FOXJ1A)
I30.cluster.assignments["40","name"] <- "Pronephros"
# 7 is clearly the heart primordium (expression of HAND2, GATA6, and GATA5)
I30.cluster.assignments["7","name"] <- "Heart Primordium"
```
### Paraxial mesoderm
Clusters 32 and 45 are the adaxial cells, based on their expression of MYL10, MYOD1, and MYOG. (Based on MYOG, cluster 45 is the more differentiated group of adaxial cells.) Clusters 10 and 15 are the somites, based on expression of MEOX1, RIPPLY1 and ALDH1A2. Cluster 10 is likely the forming simutes, based on its continued expression of MESPBA, RIPPLY2 (which is expressed in S-II and S-I), and continued expression of TBX6. Clusters 25, 18, 5, and 29 are the pre-somitic mesoderm, based on their expression of TBX6L. Finally, clusters 4 and 16 are tailbud, based on their overlapping expression of SOX2, TA, NOTO, FGF8A, and WNT8A.
```{r, fig.width=12, fig.height=5}
plotDot(object.6s.mnn, genes=c("MYL10", "MYOD1", "MYOG", "MYF5", "MEOX1", "RIPPLY1", "ALDH1A2", "RIPPLY2", "MESPBA", "TBX6", "TBX6L", "WNT8A", "FGF8A", "NOTO", "TA", "SOX2"), clustering = "Infomap-30")
```
```{r}
# 45 and 32 are the adaxial cells; 45 seems like the real tip.
# (ACTA1A, MYL10, ACTC1B, ACTC1A, MEF2D, MYOG)
I30.cluster.assignments["32","name"] <- "Adaxial Cells" # Don't use as tip
I30.cluster.assignments["45","name"] <- "Adaxial Cells" # Use as the tip
# 10 and 15 is the formed somites (MEOX1, RIPPLY1, ALDH1A2)
markers.10v15 <- markersAUCPR(object.6s.mnn, clust.1="10", clust.2="15", clustering="Infomap-30")
# Ripply1 mostly in formed somites, Ripply2 mostly in S-I and S-II
# MESPBA in future anterior half of S-II and S-I
I30.cluster.assignments["10","name"] <- "Somites Forming" # Don't use as tip
I30.cluster.assignments["15","name"] <- "Somites Formed" # Use as a tip
# 25/18/5/29 are the PSM.
I30.cluster.assignments["25","name"] <- "PSM Maturation Zone" # Not a tip
I30.cluster.assignments["18","name"] <- "PSM Posterior" # Not a tip
I30.cluster.assignments["5","name"] <- "PSM Intermediate" # Not a tip
I30.cluster.assignments["29","name"] <- "PSM Intermediate" # Not a tip
# Clusters 4 and 16 seems to be the tailbud, based on its expression
# of WNT8A, FGF8A, NOTO, TA, SOX2. They represent, to some degree the
# more neural inclined and more mesoderm inclined tissues (i.e. I
# think some of the early differentiation is also in there.)
I30.cluster.assignments["4","name"] <- "Tailbud" # Use as a tip
I30.cluster.assignments["16","name"] <- "Tailbud" # Use as a tip
```
### Neural
Clusters 26, 41, and 31 are the neural crest, based on their expression of FOXD3, SOX9B, and SOX10. Cluster 39 seems to be the floor plate, based on its combined expression of SHHA, SHHB, FOXJ1A, and FOXA2.
```{r, fig.width=12, fig.height=4}
plotDot(object.6s.mnn, genes=c("SOX10", "SOX9B", "FOXD3", "FOXJ1A", "SHHA", "SHHB", "FOXA2"), clustering="Infomap-30")
```
```{r}
# 26, 46, 31 are neural crest lineages given their high expression of SOX10 and FOXD3.
I30.cluster.assignments["26","name"] <- "Neural Crest" # Tip
I30.cluster.assignments["46","name"] <- "Neural Crest Forming" # Not tip
I30.cluster.assignments["31","name"] <- "Neural Crest Forming" # Not tip
# Cluster 39 -- floor plate???
# (FOXJ1A, SHHA, SHHB, FOXA2)
I30.cluster.assignments["39","name"] <- "Floor Plate" # Try as a tip?
```
#### Spinal Cord
Clusters 48, 11, and 14 are spinal cord. 48 and 11 are the more differenciated, given their high expression of ELAVL3, NEUROD1, NEUROD4, and NEUROG4.
```{r, fig.width=12, fig.height=4}
plotDot(object.6s.mnn, genes=c("ELAVL3", "ISL2B", "NEUROD1", "NEUROD4", "NEUROG1", "DLA", "OLIG4", "PRDM8", "NKX1.2LA"), clustering="Infomap-30")
```
```{r}
# Cluster 48 -- Spinal Cord
# ELAVL3, ISL2B, NEUROD1
I30.cluster.assignments["48","name"] <- "Spinal Cord Differentiated" # Tip?
# Cluster 11 -- Also spinal cord
# NEUROD4, NEUROG1, ELAVL3, DLA
# Difference vs. 48 = higher SOX3, SOX19A (just more progenitor like)
I30.cluster.assignments["11","name"] <- "Spinal Cord" # Tip?
# Cluster 14 -- Spinal Cord Progenitors
# CHD, HER3, OLIG4, PRDM8, NKX1.2LA
I30.cluster.assignments["14","name"] <- "Spinal Cord Progenitors" # Not tip
```
#### Fore/mid-brain
Clusters 28 and 8 are the midbrain, given their expression of ENG2A, ENG2B, HER5, HER11, and PAX2A. Cluster 20 is the telencephalon, given its expression of FOXG1A and EMX3. Clusters 12 and 2 are the optic cup, given their expression of RX3, RX2, and PAX6B. Cluster 23 is the ventral diencephalon, given its expression of DBX1A, DBX1B, and NKX2.2B. Finally, clusters 17 and 34 are the dorsal diencephalon, given their expression of LHX5, PAX6A, FOXD1, and OLIG3.
```{r, fig.width=12, fig.height=5}
plotDot(object.6s.mnn, genes=c("ENG2B", "ENG2A", "HER5", "HER11", "PAX2A", "FOXG1A", "EMX3", "DBX1A", "DBX1B", "NKX2.2B", "LHX5", "PAX6A", "FOXD1", "OLIG3", "RX3", "RX2", "PAX6B"), clustering="Infomap-30")
```
```{r}
# Cluster 28 & 8 Midbrain
# ENG2B, ENG2A, HER5, HER11, PAX2A
I30.cluster.assignments["28","name"] <- "Midbrain" # Use as tip
I30.cluster.assignments["8","name"] <- "Midbrain" # Don't use as tip
# 20 - Telencephalon
# FOXG1A / EMX3
I30.cluster.assignments["20","name"] <- "Telencephalon" # Use as a tip
# 23 - Ventral Diencephalon
# NKX2.4B / NKX2.4A / NKX2.1 / DBX1A / DBX1B / SHHA / NKX2.2B
I30.cluster.assignments["23","name"] <- "Diencephalon Ventral" # Don't use as a tip
# 34 / 17 - Dorsal Diencephalon
# ARX / LHX5 / PAX6A / FOXD1 / OLIG3 / OLIG2
# Major differences between the clusters seem to be cell cycle,
# translation related, so think these clusters should be combined.
I30.cluster.assignments["34","name"] <- "Dorsal Diencephalon" # Use as a tip
I30.cluster.assignments["17","name"] <- "Dorsal Diencephalon" # Use as a tip
# 12 / 2 - Optic Vesicle
# RX3 / RX2 / HMX4 / PAX6B / MAB21L2
# Differential markers are cell cycle related; think these clusters
# should be combined.
I30.cluster.assignments["12","name"] <- "Optic Cup" # Use as a tip, but combined
I30.cluster.assignments["2","name"] <- "Optic Cup" # Use as a tip
```
#### Hindbrain
It seems that for the hindbrain, Infomap-30 doesn't have enough resolution, based on the cluster boundaries and the boundaries of expression of the best cluster markers. Going to use Infomap-20 for clustering in this region.
It seems that cluster 21 is rhombomeres 5 & 6 (given its expression of MAFBA/VAL). Thus cluster 52 must be rhombomere 3 (given its expression of EGR2B/KROX20 and that cluster 21 contains rhombomere 5). Cluster 41 is rhombomere 4, since it expresses FGF8A. Clusters 6 and 22 are rhombomere 7, given their expression of HOXD4A and HOXA3A.
```{r, fig.width=12, fig.height=4}
i20.hb.clusters <- c("21", "52", "41", "2", "24", "6", "22")
# Try Infomap-20 clusters for hindbrain?
plotDot(object.6s.mnn, genes = c("HOXB1A", "HOXA2B", "HOXB2A", "HOXA3A", "HOXB3A", "HOXB4A", "HOXD4A", "EGR2B", "MAFBA", "FGF8A", "FGFR3", "EFNB2A"), clustering = "Infomap-20", mean.expressing.only = T, clusters.use = i20.hb.clusters)
```
```{r}
# 21 is rhombomere 5/6
I20.cluster.assignments["21","name"] <- "Hindbrain R5+6"
# 52 is rhombomere 3
I20.cluster.assignments["52","name"] <- "Hindbrain R3"
# 41 is rhombomere 4
I20.cluster.assignments["41","name"] <- "Hindbrain R4"
I20.cluster.assignments["2","name"] <- "Hindbrain" # Don't use as tip
# (What exactly is this? Some kind of as yet unpatterned hindbrain progenitors?)
# 6 and 22 are rhombomere 7?
I20.cluster.assignments["6","name"] <- "Hindbrain R7"
I20.cluster.assignments["22","name"] <- "Hindbrain R7"
```
### Non-neural ectoderm
Cluster 47 is the integument, given its expression of FOXI3A, FOXI3B, MYB, and GCM2. Cluster 13 is the neural plate border, given its expression of CRABP2A and DLX3B. Cluster 1 is the epidermis given its expression of GATA2A, TBX2B, and CYP2K16.
```{r, fig.width=12, fig.height=4}
plotDot(object.6s.mnn, genes=c("FOXI3A", "FOXI3B", "MYB", "GCM2", "DLX3B", "CRABP2A", "GATA2A", "TBX2B", "CYP2K16"), clustering="Infomap-30")
```
```{r}
# 47 is integument?
# FOXI3A / FOXI3B / MYB / GCM2
I30.cluster.assignments["47","name"] <- "Integument" # Use as a tip
# Non-neural ectoderm
# Cluster 13 is the neural plate border
I30.cluster.assignments["13","name"] <- "Neural Plate Border" # Use as a tip
# Cluster 1 is the epidermis
I30.cluster.assignments["1","name"] <- "Epidermis" # Use as a tip
```
#### Pre-placodal ectoderm
Down in the pre-placodal ectoderm / placode territory, it seems like Infomap-30 doesn't really have enough resolution, based on the cluster boundaries vs. the expression domains of the top markers of those clusters. Thus, going to use Infomap-20 clusters for this region.
Cluster 49 is the lens placode (PITX3+). Cluster 74 is olfactory placode (FOXN4+, EBF2/COE+, SIX3B+, DLX4B+). Cluster 46 is the adenohypophyseal placode (stronger SIX3B, DLX4B, HESX1). Cluster 28 is the epibranchial placode (FOXI1+, PAX8+). Cluster 54 is the otic placode (ATOH1B, TBX2B, PAX2A). Cluster 33 is the trigeminal placode (KLF17, IRX1A, P2RX3A).
```{r, fig.width=12, fig.height=5}
plotDot(object.6s.mnn, genes=c("PITX3", "PITX1", "FOXE3", "FOXN4", "EBF2", "SIX3B", "DLX4B", "HESX1", "PAX8", "FOXI1", "TBX2B", "ATOH1B", "MCF2LB", "PAX2A", "P2RX3A", "IRX1A", "KLF17"), clustering="Infomap-20", clusters.use=c("49", "74", "46", "28", "54", "33"))
```
```{r}
# Looks like 49 really is the lens.
# (PITX3, PITX1, SIX7, FOXE3)
I20.cluster.assignments["49","name"] <- "Placode Lens"
# 74 is olfactory
# (FOXN4, EBF2, PRDM8, GATAD2B)
I20.cluster.assignments["74","name"] <- "Placode Olfactory"
# 46 is adenohypophyseal
# SIX3B, DLX4B, DLX3B, HESX1,
I20.cluster.assignments["46","name"] <- "Placode Adenohypophyseal"
# 28 is/includes epibranchial -- seems right from PAX8 FOXI1,
# PAX8, FOXI1, PRDM12B, NKX2.3, GBX2
I20.cluster.assignments["28","name"] <- "Placode Epibranchial"
# 54 is otic
# ATOH1B? MCF2LB? STC2A? GBX2? ROBO4? DLX3B? SOX9A?
# (otic) (otic)
# TBX2B ATOH1B PAX2A SOX9A
I20.cluster.assignments["54","name"] <- "Placode Otic"
# 33 is trigeminal placode
# P2RX3A IRX1A KLF17
I20.cluster.assignments["33","name"] <- "Placode Trigeminal"
```
### Non-blastoderm
Cluster 53 is the EVL / Periderm (all of the keratins!). Cluster 55 is the PGCs (DDX4, H1M, NANOS3). Cluster 54 seems to be mostly YSL-related markers, and should not be used in tree-building.
```{r, fig.width=12, fig.height=4}
plotDot(object.6s.mnn, genes=c("KRT17", "KRT5", "KRT4", "KRT92", "KRT18", "LYE", "DDX4", "DND1", "H1M", "NANOS3"), clustering="Infomap-30")
```
The EVL population seems to need to be cleaned up however -- some cells are really good expressers of the EVL markers, whereas others are not.
```{r}
evl.score <- apply([email protected][c("LYE", "KRT18", "KRT92", "KRT4", "KRT5", "KRT17"), cellsInCluster(object.6s.mnn, "Infomap-30", "53")], 2, sum.of.logs)
new.evl <- names(which(evl.score > 9))
remove.evl <- names(which(evl.score <= 9))
```
```{r}
# 53 - EVL/Periderm
# KRT17, KRT5, KRT4, KRT92, KRT18, LYE
# FOXI3A / FOXI3B / MYB / GCM2
I30.cluster.assignments["53","name"] <- "EVL/Periderm" # Use as a tip
# 55 - PGCs
# DDX4 / H1M / NANOS3
I30.cluster.assignments["55","name"] <- "Primordial Germ Cells"
# 54 - YSL/yolk
# APOA1B / PVALB9 / SEPP1A / CTSLL
# Top markers are not particularly great markers and are either
# ubiquitous or yolk-associated. This is contamination.
I30.cluster.assignments["54","name"] <- "YSL" # Don't use as a tip
```
### Generate final clusterings
```{r}
# Combine clustering assignments from two clusterings
I30.cluster.assignments$clustering <- "Infomap-30"
I20.cluster.assignments$clustering <- "Infomap-20"
cluster.assignments <- rbind(I30.cluster.assignments, I20.cluster.assignments)
# Remove any clusters that weren't assigned an identity
cluster.assignments <- cluster.assignments[!is.na(cluster.assignments$name),]
# Renumber clusters
cluster.assignments$cluster.new <- 1:nrow(cluster.assignments)
# Create blank clusterings in the 6-somite object
[email protected]$clusters.6s.name <- NA
[email protected]$clusters.6s.num <- NA
# Copy cell identities over for each cluster
for (i in 1:nrow(cluster.assignments)) {
cells <- cellsInCluster(object.6s.mnn, clustering = cluster.assignments[i,"clustering"], cluster = cluster.assignments[i,"cluster"])
[email protected][cells,"clusters.6s.name"] <- cluster.assignments[i,"name"]
[email protected][cells,"clusters.6s.num"] <- as.character(cluster.assignments[i,"cluster.new"])
}
# Remove the bad cells from cluster 53 that aren't EVL.
[email protected][remove.evl, "clusters.6s.name"] <- NA
[email protected][remove.evl, "clusters.6s.num"] <- NA
```
# Transfer clusterings to main object
Need to transfer cluster identities from the 6-somite only object to the full object.
```{r}
[email protected]$`ZF6S-Infomap-30` <- NA
[email protected]$`ZF6S-Infomap-20` <- NA
[email protected]$`ZF6S-Cluster` <- NA
[email protected][rownames([email protected]), "ZF6S-Cluster"] <- [email protected]$`clusters.6s.name`
[email protected]$`ZF6S-Cluster-Num` <- NA
[email protected][rownames([email protected]), "ZF6S-Cluster-Num"] <- [email protected]$`clusters.6s.num`
```
# Save objects
We save here the 6-somite object, the full object with our 6-somite clustering added to it, and also a data.frame of the tips that can be used during further inspection to annotate which tips should be used in the tree building.
```{r, eval=F}
saveRDS(object, file="obj/object_4_withTips.rds")
saveRDS(object.6s.mnn, file="obj/object_6s.rds")
write.csv(cluster.assignments, file="dm-plots/tips-use.csv")
```
# Plot tips in diffusion map
Not all clusters from the final stage should really comprise tips of the developmental tree -- progenitor populations that remain should be excluded. For instance, embryos at 6-somite stage contain both somites (a terminal population) and pre-somitic mesoderm that will give rise to additional somites later; the pre-somitic mesoderm should not be a separate tip in the tree.
Here, we show a couple of good plots. For 'bad' clusters that shouldn't be used as tips, all combinations of diffusion components will not separate the cells significantly.
```{r, out.height='70%',out.width='50%'}
[email protected]$pop <- NA
[email protected][cellsInCluster(object, "ZF6S-Cluster", "Epidermis"), "pop"] <- "1"
plotDim(object, label = "pop", plot.title="Epidermis, DCs 17 vs. 18", reduction.use = "dm", dim.x = 17, dim.y=18, legend=F, alpha=0.35)
[email protected]$pop <- NA
[email protected][cellsInCluster(object, "ZF6S-Cluster", "Cephalic Mesoderm"), "pop"] <- "1"
plotDim(object, label = "pop", plot.title="Cephalic Mesoderm, DCs 15 vs. 16", reduction.use = "dm", dim.x = 15, dim.y=16, legend=F, alpha=0.35)
[email protected]$pop <- NA
[email protected][cellsInCluster(object, "ZF6S-Cluster", "Heart Primordium"), "pop"] <- "1"
plotDim(object, label = "pop", plot.title="Heart Primordium, DCs 25 vs. 26", reduction.use = "dm", dim.x = 25, dim.y=26, legend=F, alpha=0.35)
```