Single-Cell and Tissue Phenotype prediction
The SCTP R package contains the proposed SCTP (Single Cell Tissue Phenotype prediction) method. SCTP provides a valuable approach for analyzing and understanding the cellular malignancy within the tumor microenvironment from an innovative and integrative perspective by combining the essential information from the bulk sample phenotype, single cell composition and cellular special distribution, which would be overlooked in traditional tissue pathological slice. As an automated tissue phenotype prediction model, SCTP facilitates a more profound understanding of tumor microenvironments, enables quantitative characterization of cancer hallmarks, and elucidates the underlying complex molecular and cellular interplay.
In this tutorial, we provide multiple examples to assist you in applying SCTP to real-world applications. It encompasses instructions for estimating the likelihood of colorectal cancer using a pre-trained model. Downtream analysis and instructions on constructing a new SCTP model using your own datasets can be found in Tutorial.
This package has long term maintenance from Dr. Tao Zeng ([email protected]), Dr. Wencan Zhu ([email protected]), and Dr. Hui Tang ([email protected]).
An alternative webpage for this package can be accessed at https://github.com/valerychu/SCTP.
- python 3.9 and R 4.3.0
Please set up a virtual environment named with "env_SCTP," ensuring it includes the required packages:
- numpy
- pytorch
- pytorch_geometric
- scikit-learn
- scipy
list.of.packages <- c("ggplot2", "Seurat", "reticulate", "monocle3", "remotes")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
BiocManager::install("preprocessCore")
remotes::install_github('satijalab/seurat-wrappers')
devtools::install_github("jinworks/CellChat")
devtools::install_github('cole-trapnell-lab/monocle3')
devtools::install("SCTP")
library(SCTP)
In this section, we outline the procedure for utilizing the SCTP-CRC pretrained model (function SCTP_CRC) to evaluate cell or spot malignancy in your own datasets.
The input data must be formatted as a Seurat object, particularly for spatial transcriptomics data, where examining the image component is highly recommended for visualization of the output.
For single cell data, in cases where only the counts matrix is available, you could first use the function Seurat_preprocess to converted into a Seurat object. This function provide simplified preprocessing procedures and the output is a Seurat object.
counts <- read.csv(
file="/Users/w435u/Documents/ST_SC/Method_Compare/data/IR/GSE115978_counts.csv",
header=TRUE,
row.names = 1
)
# This data is big and can be downloaded from https://drive.google.com/drive/folders/18Jf56JPhArusPEDMt33vLNWpoQExIvJc
In this scRNA-seq dataset, each row represents a gene and each column represents a cell. The dimensions of this single-cell data are:
dim(counts)
which indicates there are 23,686 genes and 7,186 cells in total. We use the functions provided from the Seurat package to preprocess this data. To simplify the process, we wrapped the Seurat analysis pipeline into the following function:
sc_dataset <- Seurat_preprocess(counts, verbose = F, type="SC")
The output is a Seurat object that contains the required preprocessed counts matrix, as well as other helpful dimensionality reduction results, such as the PCA, t-SNE, and UMAP.
names(sc_dataset)
For the diversity of spatial transcriptomic formats, automatic preprocessing is unavailable from this package. You must initially process your data to create a Seurat object, which should include the SCT-normalized counts matrix and the image data.
Alternatively, you can also provide a Seurat object using your own pipeline, but at least a normalized data (assays$RNA@data) is required. Below we show examples with single-cell RNA-seq data (sc_dataset) and spatial transcriptomic data (st_dataset) respectively.
load("/Users/w435u/Documents/ST_SC/DATA_STSC_CAO/Seurat_L1.RData") #st_dataset
# This data is big and can be downloaded from https://drive.google.com/drive/folders/18Jf56JPhArusPEDMt33vLNWpoQExIvJc
Check on single cell data for required information.
!is.null(sc_dataset@assays$RNA@data)
Check on spatial transcriptomic data for required information.
!is.null(st_dataset@assays$SCT$data)
We begin by visualizing the cells, categorized by types as annotated in the original study, presenting only non-immune cells. These are classified into Endothelial cells (E), Fibroblasts (F), and Tumor cells (Tu), which are further subdivided into subclusters shown below:
DimPlot(sc_dataset, group.by = "cluster", reduction="tsne")+ggtitle("Cell type")+
theme(legend.position = "bottom", legend.key.size = unit(2, 'mm'))
Using the provided input, we employ SCTP-CRC to estimate the likelihood of CRC tumor of each cell.
sc_dataset <- SCTP_CRC(my_seurat = sc_dataset)
The predicted malignancy of each cell is stored as an new annotation "malignancy' in the metadata of the output Seurat object.
names([email protected])
The results can subsequently be visualized using TSNE or UMAP plots. A value closer to 1 signifies a higher malignancy level in the corresponding spots, whereas a value close to 0 suggests a normal state.
FeaturePlot(sc_dataset, features = "malignancy", reduction="tsne", )+
scale_color_gradientn(colours = col_mal)
When compared to the original cell type annotations, it is evident that a significant number of tumor cells have been assigned high malignancy scores, while non-tumor cells have been allocated low malignancy scores.
Next, We present an example using spatial transcriptomic data for prediction. Utilizing a preloaded ST Seurat object, we employ the SCTP_CRC function to predict the likelihood of tumor presence in each spot.
st_dataset <- SCTP_CRC(my_seurat = st_dataset)
Same as single-cell data input, the predicted malignancy of each spot is stored in the annotation "malignancy' in the output Seurat object.
names([email protected])
You can then visualize by SpatialFeaturePlot for spatial transcriptomic data. Value closer to 1 indicates higher malignancy of the corresponding spots, while value close to 0 indicates normal state.
SpatialFeaturePlot(st_dataset, features = "malignancy")+
scale_fill_gradientn(colours = col_mal)
SCTP model for another disease, hepatocellular carcinoma (HCC), is also available and follows the same usage protocol as the CRC model.
st_dataset <- SCTP_HCC(my_seurat = st_dataset)
