Skip to content

ztpub/SCTP

Repository files navigation

SCTP

Single-Cell and Tissue Phenotype prediction

Introduction

The SCTP R package contains the proposed SCTP (Single Cell Tissue Phenotype prediction) method. SCTP provides a valuable approach for analyzing and understanding the cellular malignancy within the tumor microenvironment from an innovative and integrative perspective by combining the essential information from the bulk sample phenotype, single cell composition and cellular special distribution, which would be overlooked in traditional tissue pathological slice. As an automated tissue phenotype prediction model, SCTP facilitates a more profound understanding of tumor microenvironments, enables quantitative characterization of cancer hallmarks, and elucidates the underlying complex molecular and cellular interplay.

image

In this tutorial, we provide multiple examples to assist you in applying SCTP to real-world applications. It encompasses instructions for estimating the likelihood of colorectal cancer using a pre-trained model. Downtream analysis and instructions on constructing a new SCTP model using your own datasets can be found in Tutorial.

This package has long term maintenance from Dr. Tao Zeng ([email protected]), Dr. Wencan Zhu ([email protected]), and Dr. Hui Tang ([email protected]).

An alternative webpage for this package can be accessed at https://github.com/valerychu/SCTP.

Installation

Prerequisites

  • python 3.9 and R 4.3.0

Python environment and packages

Please set up a virtual environment named with "env_SCTP," ensuring it includes the required packages:

  • numpy
  • pytorch
  • pytorch_geometric
  • scikit-learn
  • scipy

R packages installation

list.of.packages <- c("ggplot2", "Seurat", "reticulate", "remotes")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
BiocManager::install("preprocessCore")
remotes::install_github('satijalab/seurat-wrappers')
devtools::install_github("jinworks/CellChat")
devtools::install_github('cole-trapnell-lab/monocle3')
devtools::install("SCTP")

Testing the installation

library(SCTP)
* If you encounter difficulties in installing software packages, pls refer our install instruction file "SCTP install R list 20250610.docx"

Spots and cell malignancy prediction using SCTP-CRC model

In this section, we outline the procedure for utilizing the SCTP-CRC pretrained model (function SCTP_CRC) to evaluate cell or spot malignancy in your own datasets.

Loading your dataset

The input data must be formatted as a Seurat object, particularly for spatial transcriptomics data, where examining the image component is highly recommended for visualization of the output.

Input as gene expression matrix

For single cell data, in cases where only the counts matrix is available, you could first use the function Seurat_preprocess to converted into a Seurat object. This function provide simplified preprocessing procedures and the output is a Seurat object.

counts <- read.csv(
  file="/Users/w435u/Documents/ST_SC/Method_Compare/data/IR/GSE115978_counts.csv",
  header=TRUE,
  row.names = 1
)
# This data is big and can be downloaded from https://drive.google.com/drive/folders/18Jf56JPhArusPEDMt33vLNWpoQExIvJc

In this scRNA-seq dataset, each row represents a gene and each column represents a cell. The dimensions of this single-cell data are:

dim(counts)

which indicates there are 23,686 genes and 7,186 cells in total. We use the functions provided from the Seurat package to preprocess this data. To simplify the process, we wrapped the Seurat analysis pipeline into the following function:

sc_dataset <- Seurat_preprocess(counts, verbose = F, type="SC")

The output is a Seurat object that contains the required preprocessed counts matrix, as well as other helpful dimensionality reduction results, such as the PCA, t-SNE, and UMAP.

names(sc_dataset)

For the diversity of spatial transcriptomic formats, automatic preprocessing is unavailable from this package. You must initially process your data to create a Seurat object, which should include the SCT-normalized counts matrix and the image data.

Input as a seurat object

Alternatively, you can also provide a Seurat object using your own pipeline, but at least a normalized data (assays$RNA@data) is required. Below we show examples with single-cell RNA-seq data (sc_dataset) and spatial transcriptomic data (st_dataset) respectively.

load("/Users/w435u/Documents/ST_SC/DATA_STSC_CAO/Seurat_L1.RData") #st_dataset
# This data is big and can be downloaded from https://drive.google.com/drive/folders/18Jf56JPhArusPEDMt33vLNWpoQExIvJc

Check on single cell data for required information.

!is.null(sc_dataset@assays$RNA@data)

Check on spatial transcriptomic data for required information.

!is.null(st_dataset@assays$SCT$data)

Malignancy prediction with SCTP-CRC model

Prediciton for single-cell RNA-seq data

We begin by visualizing the cells, categorized by types as annotated in the original study, presenting only non-immune cells. These are classified into Endothelial cells (E), Fibroblasts (F), and Tumor cells (Tu), which are further subdivided into subclusters shown below:

DimPlot(sc_dataset, group.by  = "cluster", reduction="tsne")+ggtitle("Cell type")+
  theme(legend.position = "bottom", legend.key.size = unit(2, 'mm'))

Using the provided input, we employ SCTP-CRC to estimate the likelihood of CRC tumor of each cell.

sc_dataset <- SCTP_CRC(my_seurat = sc_dataset)

The predicted malignancy of each cell is stored as an new annotation "malignancy' in the metadata of the output Seurat object.

The results can subsequently be visualized using TSNE or UMAP plots. A value closer to 1 signifies a higher malignancy level in the corresponding spots, whereas a value close to 0 suggests a normal state.

FeaturePlot(sc_dataset, features = "malignancy", reduction="tsne", )+
  scale_color_gradientn(colours = col_mal)

When compared to the original cell type annotations, it is evident that a significant number of tumor cells have been assigned high malignancy scores, while non-tumor cells have been allocated low malignancy scores.

Prediciton for spatial transcriptomic data

Next, We present an example using spatial transcriptomic data for prediction. Utilizing a preloaded ST Seurat object, we employ the SCTP_CRC function to predict the likelihood of tumor presence in each spot.

st_dataset <- SCTP_CRC(my_seurat = st_dataset)

Same as single-cell data input, the predicted malignancy of each spot is stored in the annotation "malignancy' in the output Seurat object.

You can then visualize by SpatialFeaturePlot for spatial transcriptomic data. Value closer to 1 indicates higher malignancy of the corresponding spots, while value close to 0 indicates normal state.

SpatialFeaturePlot(st_dataset, features = "malignancy")+
  scale_fill_gradientn(colours = col_mal)

SCTP model for another disease, hepatocellular carcinoma (HCC), is also available and follows the same usage protocol as the CRC model.

st_dataset <- SCTP_HCC(my_seurat = st_dataset)

Class tutorial of SCTP usage

Class 1: Start for new user

English version: image

Chinese version: image

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages