Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch effect seen on Heatmap #216

Open
Lucas-Maciel opened this issue Oct 27, 2023 · 5 comments
Open

Batch effect seen on Heatmap #216

Lucas-Maciel opened this issue Oct 27, 2023 · 5 comments

Comments

@Lucas-Maciel
Copy link

Lucas-Maciel commented Oct 27, 2023

Hi,

First of all, thank you for the very nice method. I have used harmony in two different datasets in combination with SCT and, in both datasets, I have seen that when I plot the heatmap, you can distinguish the different samples in each cluster.

I'm using the following code

merged_seu <- merge(seurat_list$samp1,y = c(seurat_list$samp2,seurat_list$samp3))
seu_harmony <- SCTransform(merged_seu, verbose = TRUE, vars.to.regress = c("percent.mt", 'S.Score', 'G2M.Score'))
seu_harmony <- RunPCA(seu_harmony , verbose = TRUE)
seu_harmony <- RunHarmony(seu_harmony, group.by.vars = "orig.ident", assay.use="SCT")

Am I doing something wrong here when combining both methods? Is there anything I can do to have a more homogeneous expression? It's important to say that these samples come from different time points, but it gets confusing if it's biological or technical

Screenshot 2023-10-27 at 11 17 53

Thank you for the attention

@pati-ni
Copy link
Collaborator

pati-ni commented Nov 2, 2023

Hi @Lucas-Maciel,

Can you share some code on how you generate the heatmap?

Also can you share some UMAP which show before and after integration?

That will help us understand better the problem

@Lucas-Maciel
Copy link
Author

Hi @pati-ni

Here is a bit more of the code after running harmony

dimensions <- 1:5
seu_harmony  <- FindNeighbors(seu_harmony , dims = dimensions, reduction = "harmony")
seu_harmony  <- RunUMAP(seu_harmony , dims=dimensions, reduction = "harmony")
seu_harmony  <- FindClusters(seu_harmony , resolution = 0.05)
Markers <- FindAllMarkers(seu_harmony,only.pos = T,min.pct = 0.3,logfc.threshold = 0.3) 

Markers %>%
  group_by(cluster) %>%
  top_n(n = 25, wt = avg_log2FC) -> top25

DoHeatmap(seu_harmony,features = top25$gene)+ theme(axis.text.y=element_text(size=6))

Before harmony
Screenshot 2023-11-09 at 11 07 11
After harmony
Screenshot 2023-11-09 at 11 06 52

Samples represented by the colors blue and green are from the same time point and batch of sequencing.

@pati-ni
Copy link
Collaborator

pati-ni commented Nov 30, 2023

I would make sure this seu_harmony <- FindClusters(seu_harmony , resolution = 0.05) uses indeed the harmony embeddings.

Also, it seems that you are using only 5 dims. Have you experimented with using more PCs in your analysis?

@Lucas-Maciel
Copy link
Author

From what I understand FindClusters performs graph-based clustering on the neighbor graph that is constructed with the FindNeighbors function (which I used the harmony reduction). There is no reduction argument available in the FindClusters.

I tried using 10 dimensions and the main clusters mostly remain the same, just change the shape of the UMAP.

@pati-ni
Copy link
Collaborator

pati-ni commented Nov 30, 2023

I see. Thanks for clarifying. Looking back at the thread I think I understand better.

From my understanding, the issue isn't that the UMAPs are batchy, it is just that you observe the batch effects remain in the gene expression space and this is picked up by the FindAllMarkers(). Unfortunately, that's expected because harmony does not touch your gene expression values but only transforms the PCs you provide.

If you want to mitigate this effect you could try using GLMs for gene expression data and include the batch as a covariate. FindAllMarkers is too simplistic for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants