-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch effect seen on Heatmap #216
Comments
Hi @Lucas-Maciel, Can you share some code on how you generate the heatmap? Also can you share some UMAP which show before and after integration? That will help us understand better the problem |
Hi @pati-ni Here is a bit more of the code after running harmony
Samples represented by the colors blue and green are from the same time point and batch of sequencing. |
I would make sure this Also, it seems that you are using only 5 dims. Have you experimented with using more PCs in your analysis? |
From what I understand FindClusters performs graph-based clustering on the neighbor graph that is constructed with the FindNeighbors function (which I used the harmony reduction). There is no reduction argument available in the FindClusters. I tried using 10 dimensions and the main clusters mostly remain the same, just change the shape of the UMAP. |
I see. Thanks for clarifying. Looking back at the thread I think I understand better. From my understanding, the issue isn't that the UMAPs are batchy, it is just that you observe the batch effects remain in the gene expression space and this is picked up by the If you want to mitigate this effect you could try using GLMs for gene expression data and include the batch as a covariate. |
Hi,
First of all, thank you for the very nice method. I have used harmony in two different datasets in combination with SCT and, in both datasets, I have seen that when I plot the heatmap, you can distinguish the different samples in each cluster.
I'm using the following code
Am I doing something wrong here when combining both methods? Is there anything I can do to have a more homogeneous expression? It's important to say that these samples come from different time points, but it gets confusing if it's biological or technical
Thank you for the attention
The text was updated successfully, but these errors were encountered: