Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The error of harmony after subsetting data #256

Open
Carrey14 opened this issue Aug 21, 2024 · 2 comments
Open

The error of harmony after subsetting data #256

Carrey14 opened this issue Aug 21, 2024 · 2 comments
Labels
question Further information is requested

Comments

@Carrey14
Copy link

Carrey14 commented Aug 21, 2024

Hello,
Thanks for developing an excellent tool for batch correction.
When I used Harmony to correct batch between two datasets, I found that Harmony perfectly corrected the batch effects in the overall cell population.
大群1

However, when I extracted a small subset, such as the T cell population, and re-ran all the steps from scaling to Harmony on this subset, I observed that while samples from a single dataset integrated well, the T cells from the two datasets showed clear batch effects, resulting in two distinct T cell clusters corresponding to the original datasets. Why is this happening?How to solve this problem?
T2

Samples from all belong to one data set, and II belongs to another.
Thanks!

@pati-ni
Copy link
Collaborator

pati-ni commented Oct 23, 2024

Can you provide the steps for the analysis? Also, can you provide where are the subsetted cells of the second UMAP in the first one?

@pati-ni pati-ni added the question Further information is requested label Oct 23, 2024
@Carrey14
Copy link
Author

Can you provide the steps for the analysis? Also, can you provide where are the subsetted cells of the second UMAP in the first one?

I'm sorry for the late reply. Here is the code I analyzed and the cluster in the red circle in the figure is the T-cell cluster I extracted.
`HCC_harmony <- NormalizeData(HCC_all) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA(npcs = 100,verbose=FALSE)

system.time({HCC_harmony2 <- RunHarmony(HCC_harmony, group.by.vars = "orig.ident")})
pdf("el.pdf", width = 10, height = 7)
ElbowPlot(HCC_harmony2, ndims = 100)
dev.off()

pc.num=1:39
HCC_harmony3 <- FindNeighbors(HCC_harmony2, reduction = "harmony", dims = pc.num) %>% FindClusters(resolution = 0.4)
HCC_harmony4 <- RunUMAP(HCC_harmony3, reduction = "harmony", dims = pc.num)
HCC_harmony5 <- RunTSNE(HCC_harmony4, reduction = "harmony", dims = pc.num)
#I have completed the cell annotation and added the groups "ALL" and "II" according to the source of the data set.
T_cell <- subset(HCC_harmony5, ident= "T cells")
sce = CreateSeuratObject(counts = T_cell@assays$RNA@counts,
meta.data = [email protected])
names(sce@reductions)

#NULL
T_cell2 <- NormalizeData(sce, normalization.method = "LogNormalize",
scale.factor = 1e4)

GetAssay(T_cell2,assay = "RNA")

T_cell2 <- FindVariableFeatures(T_cell2,
selection.method = "vst", nfeatures = 2000)
T_cell2 <- ScaleData(T_cell2)
T_cell2 <- RunPCA(object = T_cell2,npcs = 50,verbose=FALSE)
system.time({T_cell_harmony <- RunHarmony(T_cell2, group.by.vars = "orig.ident", project.dim = F)})
dims = 1:15
T_cell_harmony2 <- FindNeighbors(T_cell_harmony, reduction = "harmony",
dims = dims)
T_cell_harmony2 <- FindClusters(T_cell_harmony2, resolution = 0.8)
table([email protected]$seurat_clusters)

T_cell_harmony3 <- RunUMAP(T_cell_harmony2, dims = dims,
reduction = "harmony")
T_cell_harmony3 <- RunTSNE(T_cell_harmony3, dims = dims,
reduction = "harmony")`
384751002-ade22805-6b45-427d-adee-4516cb839cb8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants