Spot2vector is a novel computational framework that leverages a ZINB-based graph autoencoder for spatial clustering and data denoising. This method integrates both spatial and expression information to provide a comprehensive analysis of spatial transcriptomics (ST) data.
- Anaconda or Miniconda: Ensure you have either Anaconda or Miniconda installed.
- CUDA version >= 11.8: Required for GPU acceleration.
- NVIDIA GPU available: Ensure you have a compatible NVIDIA GPU.
For detailed installation instructions, please refer to INSTALLATION.md
The input data for Spot2vector should be an AnnData object, which can be loaded using scanpy.read_h5ad. The AnnData object must contain:
-
Preprocessed Expression Data: The expression data should be preprocessed using standard single-cell RNA-seq preprocessing steps:
import scanpy as sc # Normalize total counts sc.pp.normalize_total(adata, target_sum=1e4) # Log transform the data sc.pp.log1p(adata) # Select highly variable genes sc.pp.highly_variable_genes(adata, n_top_genes=8000, flavor='seurat_v3')
-
Spatial coordinates: The spatial coordinates should be stored in
adata.obsm["spatial"]. The coordinates should be a 2D array of shape(n_spots, 2). -
Optional PCA: For improved efficiency in constructing the expression similarity graph, you can perform PCA to obtain a low-dimensional representation:
sc.pp.pca(adata, n_comps=10)
Construct spatial and expression graphs using the following commands:
import spot2vector
# Spatial graph based on spatial coordinates
spot2vector.Build_Graph(adata, radius_cutoff=150, cutoff_type='radius', graph_type='spatial')
# Expression graph based on expression similarity
spot2vector.Build_Graph(adata, neighbors_cutoff=4, cutoff_type='neighbors', graph_type='expression')Train the model using the following command:
device = 'cuda:0' # Specify the GPU device
spot2vector.Fit(adata, device=device)Perform spatial clustering using both the expression embeddings and spatial embeddings. The n_clusters parameter specifies the number of spatial domains, and users need to provide this value based on their dataset and biological knowledge.
# Expression embeddings
spot2vector.Clustering(adata, obsm_data='exp_embeddings', method='mclust', n_cluster=n_clusters, verbose=False)
# Spatial embeddings
spot2vector.Clustering(adata, obsm_data='spa_embeddings', method='mclust', n_cluster=n_clusters, verbose=False)Perform model inference to obtain the final embeddings:
# lamda = 1 for expression, lamda = 0 for spatial
spot2vector.Infer(adata, lamda=0.2, device=device)Perform the final spatial clustering using the combined embeddings:
spot2vector.Clustering(adata, obsm_data='embeddings', method = 'mclust', n_cluster=n_clusters, verbose=False)This project is licensed under the MIT License - see the LICENSE file for details.