Tutorial CONICSmat: Inference of clones in GBM single cell data from Darmanis et al

In this tutorial we will analyze the single cell RNA-seq data of 4 primary glioblastoma (GBM) patients presented in this paper. We will go beyond the analysis of the authors in the paper and show that it is actually specific genetic clones that are found in the migration front of certain tumors.

Visualization of chromosomal alterations in cells (rows) along the genome (columns) for patient BT_S2. Colors indicate copy number gains (red) and losses (blue). Expression in each cell was smoothed in a sliding window along genes ordered by their chromosomal location. The average expression in non-malignant cells (bottom below dashed line) was used to normalize the expression data. The region where each cell was obtained from (location of the biopsy) is indicated in the second column on the right. Green means migration front, red tumor core.

The count data is available from here. Once the data is unzipped it contains multiple files, of which we will used the normalized counts file, the tsne coordinates and the metadata file.

Please make sure you set your R working directory to the file pathe where the unzipped tables are located. Next, we can read the data and transform it to log2(CPM/10+1).

library(CONICSmat)
library(Matrix)

a=Matrix(as.matrix(read.table("GBM_normalized_gene_counts.csv",sep=" ",header=T,row.names=1,check.names=F)),sparse=T)
a=((2^a)-1)
a=log2(a/10+1)
b=read.table("GBM_metadata.csv",header=T, row.names=1,sep=" ",check.names=F)
b=b[colnames(a),]
tsne=read.table("GBM_TSNE.csv",header=T,row.names = 1,check.names = F)

Now that we've read the data we can assign which cells are neoplastic and non-neoplastic based on the identification of the authors. The labels are not directly given in the metadata file, but can be inferred by the tsne coordinates (compare to the manuscript or the gbmseq.org website).

tumor1=which(tsne[,1]>(-15) & tsne[,2]<(-15) & tsne[,1]<(38))
tumor2=which(tsne[,1]>(40) & tsne[,2]>(-40) & tsne[,2]<(-0))
tumor=c(tumor1,tumor2)
normal=setdiff(1:ncol(a),tumor)

Let's visualize which cells are putative tumor (red) and which are normal cells (black):

colo=rep("black",ncol(a));colo[tumor]="red"
plot(tsne[,1],tsne[,2],col=colo,pch=16)

Next, we will uses CONICSmat to predict the presence/absence of CNVs affecting whole chromosomes (find chromosomal position file here):

regions=read.table("chromosome_full_positions_grch38.txt",sep="\t",row.names = 1,header = T)
gene_pos=getGenePositions(rownames(a))
a=filterMatrix(a,gene_pos[,"hgnc_symbol"],minCells=5)
normFactor=calcNormFactors(a)
l=plotAll(a,normFactor,regions,gene_pos,paste("Darmanis","_CNVs",sep=""),repetitions = 10,postProb = 0.75,normal = normal, tumor = tumor)

r=plotChrEnichment(a,7,normFactor,gene_pos,groups1 = normal, groups2 = tumor,repetitions = 200,postProb = 0.75)

par(mfrow=c(1,1))
colo=ifelse(l[,"7"]>0.8 | l[,"10"]<0.2,"red","black")
plot(tsne[,1],tsne[,2],col=colo,pch=16)

Based on the CNV prediction, almost all putative tumor cells have gain of chr7 and/or loss of chr10, hallmark and founder events of GBM. Let's only focus on cells predicted by the authors as tumor cells and having one of the alterations:

length(tumor)
tumor=intersect(tumor,which(l[,"7"]>0.8 | l[,"10"]<0.2))
length(tumor)

And the same for normal cells (exclude those which might have alterations):

length(normal)
normal=intersect(normal,which(l[,"7"]<0.2 & l[,"10"]>0.8))
length(normal)

Now that we've cleaned the dataset, let's generate a good "normal" brain reference by sampling the same amount of cells from all non-malignant brain cells (otherwise the majority of reference cells would be microglia):

mg=intersect(normal,which(tsne[,2]>(-7) & tsne[,1]<(46) & tsne[,1]>(-35)))
opc=intersect(normal,which(tsne[,2]>(38) & tsne[,1]<(-38)))
oligo=intersect(normal,which(tsne[,1]<(-64)))
astro=intersect(normal,which(tsne[,2]<(-35) & tsne[,1]<(-21)))
endo=intersect(normal,which(tsne[,2]<(-40) & tsne[,1]>(40)))
normal=c(sample(astro,30),sample(endo,30),sample(oligo,30),sample(opc,30),sample(mg,30))

In addition, let's get the patient IDs and the location where the sample was obtained from from the metadata table:

patients=b[,"Sample.name"]
loc=b[,"Location"]

Now, we can have a first peak at how the data looks like by plotting the tumor cell expression averaged over 151 genes after normalizing to the average expression in the normal reference cells:

pcol=string.to.colors(patients)[c(normal,tumor)]
png(paste("Darmanis_regions",".png",sep=""),width = 1000,height = 600)
plotChromosomeHeatmap(a[,c(normal,tumor)],normal = 1:length(normal),plotcells = 1:(length(normal)+length(tumor)),chr=T, gene_pos = gene_pos,windowsize = 151,thresh = 1.2,expThresh = 0.2,colo = pcol)
abline(h=length(normal),lty=16)
dev.off()

We see that cells mostly cluster by patient and that there seems to be multiple clones present most tumors (see tumor ID visualized by color in the 1st vertical bar on the right, the location [core=red, periphery=green in the second column]). Many chromosomes are lost/gained only in a subset of cells.

To figure out if there is an enrichment for specific subclones in the migrating front we focus first on the tumor of patient BT_S2.

patient="BT_S2"
bts2=intersect(tumor,which(patients==patient))

Now, for a good visualization of clones, let's use the posterior probabilities of the mixture model in regions altered by copy number in this patient to order the tumor cells by clones:

candRegions=c(4,7,10,12,13,17,22)
bin_mat=binarizeMatrix(l[,candRegions],normal,tumor,0.5)
gtm=apply(bin_mat[bts2,], 1, paste, collapse=",")

The resulting matrix holds a genotype for each cell, indicating the presence/absence of each of the CNVs. We can now summarize the matrix and use that summary to order the cells by the number of mutations they hold:

geno=sort(table(apply(bin_mat[bts2,], 1, paste, collapse=",")),decreasing = T)
geno
geno_sort=geno[sort(names(geno[which(geno>15)]))]
length(geno)

We see that we have 8 genotypes present that are represented by more than 15 individual cells. Let's order the tumor cells by the number of mutations in these cells.

tumor3=bts2[c(which(gtm==names(geno_sort)[8]),which(gtm==names(geno_sort)[7]),which(gtm==names(geno_sort)[6]),which(gtm==names(geno_sort)[5]),which(gtm==names(geno_sort)[4]),which(gtm==names(geno_sort)[3]),which(gtm==names(geno_sort)[2]),which(gtm==names(geno_sort)[1]))]

Last, we can plot these and see the individual clones present in this tumor:

pcol=string.to.colors(patients)[c(normal,tumor3)]
rcol=string.to.colors(loc)[c(normal,tumor3)]
png(paste(patient,"genotypes.png",sep=""),width = 1000,height = 600)
plotChromosomeHeatmap(a[,c(normal,tumor3)],normal = 1:length(normal),plotcells = 1:(length(normal)+length(tumor3)),chr=F, gene_pos = gene_pos,windowsize = 151,thresh = 1,expThresh = 0.2,colo = pcol,colo2 = rcol)
abline(h=length(normal),lty=3)
for(i in 0:8){abline(h=(length(normal)+sum(geno_sort[(8-i):8])),col="black")}
dev.off()

Interestingly, we see a clear enrichment of cells from the migrating front in clone 5 (counted from top) characterized by alterations: chr7+ chr10- chr13- chr22- but not any other alterations. This suggests that GBM cells in this tumor started to migrate relatively "early" (when they had not acquired the 4 other CNV events found in "older" clones) and that acquiring the chr13 loss might be an important event in the initiation of tumor cell migration.

Conversely, looking into the second tumor for which the authors were able to isolate tumor cells from the migration front, BT_S4, the clones found in the migration front have a CNV (chr6 loss) not found in cells from the primary tumor (this might be due to undersampling).

tumor1=which(tsne[,1]>(-15) & tsne[,2]<(-15) & tsne[,1]<(38))
tumor2=which(tsne[,1]>(40) & tsne[,2]>(-40) & tsne[,2]<(-0))
tumor=c(tumor1,tumor2)
normal=setdiff(1:ncol(a),tumor)

length(tumor)
tumor=intersect(tumor,which(l[,"7"]>0.8 | l[,"10"]<0.2))
length(tumor)

length(normal)
normal=intersect(normal,which(l[,"7"]<0.2 & l[,"10"]>0.8))
length(normal)
mg=intersect(normal,which(tsne[,2]>(-7) & tsne[,1]<(46) & tsne[,1]>(-35)))
opc=intersect(normal,which(tsne[,2]>(38) & tsne[,1]<(-38)))
oligo=intersect(normal,which(tsne[,1]<(-64)))
astro=intersect(normal,which(tsne[,2]<(-35) & tsne[,1]<(-21)))
endo=intersect(normal,which(tsne[,2]<(-40) & tsne[,1]>(40)))
normal=c(sample(astro,30),sample(endo,30),sample(oligo,30),sample(opc,30),sample(mg,30))

patient="BT_S4"
bts2=intersect(tumor,which(patients==patient))

rcol=string.to.colors(loc)[c(normal,bts2)]
png(paste(patient,"clusterplot.png",sep=""),width = 1000,height = 600)
plotChromosomeHeatmap(a[,c(normal,bts2)],normal = 1:length(normal),plotcells = 1:(length(normal)+length(bts2)),chr=T, gene_pos = gene_pos,windowsize = 151,thresh = 1,expThresh = 0.2,colo = rcol)
abline(h=length(normal),lty=3)
dev.off()

The fact that in both cases clones are enriched in the migration front suggests that these may have a selective advantage through their altered genomes that allows them to obtain increased migratory potential.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial CONICSmat: Inference of clones in GBM single cell data from Darmanis et al

Clone this wiki locally