-
Notifications
You must be signed in to change notification settings - Fork 41
Optimized SCC Implementation and Removed TensorFlow Dependencies
#250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Those updates are awesome; thanks for your contribution! I noticed you've specified matplotlib<=3.6.2 and pandas<=1.5.3 in your changes. Is there a specific reason for keeping these versions? We're currently updating these dependencies, and I also feel they may be causing some dependency issues in the CI. |
|
Thank you for your feedback! I’m glad you found the updates helpful. Regarding the specific versions of
However, I understand the importance of keeping our dependencies up-to-date. If the current versions are causing CI issues, I’m more than willing to help investigate and test with the latest versions to ensure everything works smoothly. |
|
@Sichao25 But the latest version of anndata seems to have solved the support bug for pandas versions greater than 2. This requires more testing to ensure compatibility |
|
Thanks for sharing your concerns. We are updating important dependencies like About scc, it will be nice to have STAGATE introduced as a new option. How about setting it as an optional dependency that users can install themselves? Specifically, we won't add new dependency directly to |
Xiaojieqiu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Starlitnightly thanks for this pull request. Your ideas on switching from tensorflow to pytorch is a great one. I also like your optimize_cluster function which can definitely be used to smooth the cluster layer on the space
|
|
||
| # The itermediate model gets the output of the bottleneck layer, | ||
| # which acts as the projection layer. | ||
| self.intermediate_layer_model = Model(inputs=model.input, outputs=model.get_layer(bname).output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Starlitnightly it looks like your updated code doesn't set the intermediate_layer_model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it is always None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented intermediate_layer_model inside the AutoEncoder later on, which should really be removed in self, I need to test it a bit further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this function on simulated data, and it runs well. However, I haven't found specific use cases for this function. Could someone provide specific scenarios where this function is used so that I can further optimize it?
|
@Starlitnightly regarding your comments on scc, we found scc works well on several dataset given its simple formulation. (1) but use STAGATE to replace pca seem like a good idea. (2) are you saying spateo's leiden results are worse than scanpy? |
I compared the implementation of leiden in spateo and scanpy, in fact both are the same, so the result is the same, it is the |
I'm going to try to implement STAGATE in spateo in a future PR. Since the author doesn't give a usable package directly in |
|
Maybe it need to be added more method for cluster SOTA in this commit. |
|
The latest tutorial can be viewed in the pull request of spateo-tutorial: https://github.com/aristoteleo/spateo-tutorials/blob/bf32d5739f380948e76cbe07c99e3e8d6d1e3627/5_cluster_digitization/1_bin_scc.ipynb It should be noted that CAST and STAGATE do not perform well on the current dataset, but they outperform SCC on other datasets. Additionally, in the new tutorial, I have modified the method of spatial domain annotation to use dictionary-based annotation instead of sequential annotation. |
There are a few packages providing commands. Try e.g. `pip install scanpy-
scripts`!
positional arguments:
{settings}
options:
-h, --help show this help message and exit in pySTAGATE
|
excellent work. I am going to merge the pull request now. |

When installing
spateoon macOS, I encountered version incompatibility errors. For instance, thevtkpackage requires a minimum Python version of 3.9. Additionally, installing bothtorchandtensorflowsimultaneously can lead to package conflicts. After thoroughly reviewing the implementation of the relevant TensorFlow code, I rewrote all TensorFlow-related code using PyTorch and removed all TensorFlow-related dependencies fromrequirements.txt.Updates:
NLPCAusing PyTorch instead of TensorFlow.weighted_binary_crossentropyusing PyTorch instead of TensorFlow.calculate_leiden_partitionand addedlogger.info.optimize_cluster.Notes:
After carefully comparing the clustering effects of Leiden in
scanpyandspateo, I found that the enhanced effect is due todynamo.tl.neighborscompared toscanpy.pp.neighbors. The exact reason for this enhancement still needs further investigation.Additionally, it is important to note that
sccdoes not achieve state-of-the-art (SOTA) performance and does not yield better results on the gold standard dataset of human cortical neurons.Since
sccis an adjacency matrix that directly combines spatial neighborhoods and PCA neighborhoods, mclust is not applicable and was not included in this PR. Perhaps I could introduce STAGATE into thespateoframework and use it in place of PCA. However, this would introduce a new dependency,pyg, which might complicate updating the existingrequirements.txtand incur additional installation costs for users.