Skip to content

Batch Correction in scRNA-seq: How to Preserve Biological Differences #270

@sunsunb

Description

@sunsunb

I would greatly appreciate your opinions, comments, and advice on the following:

We have scRNA-seq data from two experimental conditions: CTRL and STIM.

Under CTRL, we have two samples (each sample is its own batch).

Under STIM, we have three samples (again, one sample = one batch).

I integrated all these samples into a single Seurat object. However, after applying Harmony for batch correction, according to umap plot, I noticed that the biological differences between the CTRL and STIM groups appear to be removed, which suggests possible overcorrection.

Below is the code I used.

merged_seurat <- merge(singlet_list[[1]], y = singlet_list[2:length(singlet_list)])
merged_seurat <- SCTransform(merged_seurat, vars.to.regress = "percent_mt", verbose = TRUE)

DefaultAssay(merged_seurat) <- "SCT"
df <- FindVariableFeatures(merged_seurat, selection.method = "vst", nfeatures = 2000)
df <- RunPCA(df, verbose = FALSE)
df <- RunUMAP(df, dims = 1:30, verbose = FALSE)
df <- FindNeighbors(df, dims = 1:30, verbose = FALSE)
df <- FindClusters(df, resolution = 0.5, verbose = FALSE)


df.harmony <- RunHarmony(df,group.by.vars = c("orig.ident", "condition"), plot_convergence = TRUE,assay.use="SCT")

df <- RunUMAP(df, dims = 1:30, verbose = FALSE)
df.harmony <- RunUMAP(df.harmony, dims = 1:30, reduction = 'harmony', verbose = FALSE)
# Visualization
before <- DimPlot(df, reduction = "umap", group.by = "orig.ident")
after <- DimPlot(df.harmony, reduction = "umap", group.by = "orig.ident")
by_sample <- DimPlot(df.harmony, reduction = "umap", split.by = "orig.ident")
before|after

Image

before <- DimPlot(df, reduction = "umap", group.by = "condition")
after <- DimPlot(df.harmony, reduction = "umap", group.by = "condition")
by_sample <- DimPlot(df.harmony, reduction = "umap", split.by = "condition")
before|after

Image

Could anyone kindly suggest how I can properly correct for batch effects without eliminating meaningful group differences?
Any suggestions or best practices would be greatly appreciated

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions