-
Notifications
You must be signed in to change notification settings - Fork 104
Description
Hello, I am attempting to integrate a dataset comprised of 22 samples (~1M cells total) and I specifically would like to regress out two covariates (GCResponse and Subject). I have attempted this two ways with separate issues arising from each. I followed the standard Seurat pipeline for QC on the merged object and found that the two covariates in question showed signficant batch effects:
`
Run QC on the Merged Seurat
Combined[["percent.mt"]] <- PercentageFeatureSet(Combined, pattern = "^MT-")
VlnPlot(Combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0)
Combined <- subset(Combined, subset = nFeature_RNA >= 250 & nCount_RNA >= 500 & percent.mt < 15) #Need to adjust these parameters based on violin plot
Combined <- NormalizeData(Combined)
Identify the most variable genes in the merged object
Combined <- FindVariableFeatures(Combined,
selection.method = "vst",
nfeatures = 2000,
verbose = FALSE)
Scale the counts
Combined <- ScaleData(Combined)
Perform PCA
Combined <- RunPCA(Combined, npcs = 50)
Visualize the unintegrated dataset
Combined <- FindNeighbors(Combined, dims = 1:20, reduction = "pca")
Combined <- FindClusters(Combined, resolution = 0.3, cluster.name = "unintegrated_clusters")
Combined <- RunUMAP(Combined, dims = 1:20, reduction = "pca", reduction.name = "umap.unintegrated")
DimPlot(Combined, reduction = "umap.unintegrated", group.by = 'Subject')`
The unintegrated UMAP shows signficant batch effects arising from the 'Subject' covariate (and in turn 'GCResponse'):
I then attempted to use harmony integration via IntegrateLayers:
Integrate the dataset
`library(parallel)
detectCores() #24
num_cores = 30
combined.hy <- IntegrateLayers(
object = Combined, method = HarmonyIntegration,
orig.reduction = "pca", new.reduction = "harmony",
theta = 4, lambda = 0.5,
max.iter.harmony = 20L, max.iter.cluster = 50L,
verbose = TRUE, num.threads = num_cores)
combined.hy <- FindNeighbors(combined.hy, dims = 1:30)
combined.hy <- FindClusters(combined.hy, resolution = 0.3)
combined.hy <- RunUMAP(combined.hy, reduction = "harmony", dims = 1:30, reduction.name = "umap.harmony")
DimPlot(combined.hy, reduction = "umap.harmony", split.by = 'GCResponse', label.size = 2)`
This somewhat mitigated the effects of the 'Subject' covariate, but still resulted in fairly substantial 'GCResponse' effects (especially between the Healthy and High Responder groups, which is unexpected biologically):
I then attempted to instead run harmony integration directly via RunHarmony:
`
Integrate the dataset
combined.Hy <- RunHarmony(Combined,
group.by.vars = c("Subject", "GCResponse"),
reduction = "pca", assay.use = "SCT", reduction.save = "harmony",
dims = 1:30, theta = c(4,4), lambda = 0.5, max_iter = 50)
combined.Hy <- FindNeighbors(combined.Hy, dims = 1:30)
combined.Hy <- FindClusters(combined.Hy, resolution = 0.3)
combined.Hy <- RunUMAP(combined.Hy, reduction = "harmony", dims = 1:30, reduction.name = "umap.harmony")
DimPlot(combined.Hy, reduction = "umap.harmony", split.by = 'GCResponse', label.size = 2) `
After some tuning of the parameters, I achieved a more uniform UMAP with minimal effects from Subject or GCResponse. My issue here is that the clustering now seems to be heavily influenced by GCResponse instead of the UMAP space.
I am at a loss for how to correct for this, as adjusting the dims and resolution parameters in FindNeighbords and FindClusters, respectively, have not helped. Any advice would be highly appreciated, thank you!






