(Single Cell proteomics readout of expression)
SCeptre is a python package that extends the functionalities of Scanpy to analyze multiplexed single-cell proteomics data.
Tested on Ubuntu 20.04.1 LTS.
It's recommended to work in a dedicated conda environment. E.g:
conda create -n sceptre python=3.7
conda activate sceptre
Clone the repository and cd
into its root directory. Then:
pip install .
Usage is exemplified in the notebooks for the analysis from the mansucript "Quantitative Single-Cell Proteomics as a Tool to Characterize Cellular Hierarchies".
The analysis can be replicated using the provided conda environment:
conda env create -f Schoof_et_al/code/environment.yml
conda activate sceptre
pip install Schoof_et_al/code/sceptre-0.1-py3-none-any.whl
The required data can be downloaded from http://proteomecentral.proteomexchange.org using the dataset identifier PXD020586
Find the notebooks in the subdirectory Schoof_et_al/code
, place the required data in Schoof_et_al/data
, and create the folder Schoof_et_al/results/tmp/
.
The following notebooks process the different datasets:
Notebook | Description |
---|---|
300ms.ipynb | Sceptre analysis of the 'medium' dataset. |
500ms.ipynb | SCeptre analysis of the 'high' dataset. |
bulk.ipynb | SCeptre analysis of the 'bulk' dataset. |
integrated.ipynb | SCeptre analysis of the 'integrated' dataset. |
Each function has its own docstring explaining the function in depth. A typical workflow makes usage of the following steps:
To create the meta data for each cell, as done in Schoof et al., from a collection of tables describing the experimental design
and layouts of the 384-well plates, the following function is used. For details on the required tables, have a look at Schoof_et_al/data/500ms
.
import sceptre as spt
spt.create_meta_data(input_dir="../data/500ms/", output_dir=res_dir)
Alternatively, the meta data table can be created by the user. It requires the columns File ID
and Channel
to map the meta data to each cell.
To load the dataset into python, to following function is used. To this end, only output tables from Proteome Discoverer are supported.
dataset = spt.load_dataset(proteins = "../data/500ms/500ms_Proteins.txt",
psms = "../data/500ms/500ms_PSMs.txt",
msms = "../data/500ms/500ms_MSMSSpectrumInfo.txt",
files = "../data/500ms/500ms_InputFiles.txt",
meta = res_dir + "meta.txt")
The dataset object can be used to quality control each LC-MS run with the follwing functions.
spt.plot_psms_msms(dataset)
spt.plot_avg_sn(dataset)
spt.plot_set_overview(dataset)
s_c_channels = ['127N', '128N', '128C', '129N', '129C', '130N', '130C',
'131N','131C', '132N', '132C', '133N', '133C', '134N']
spt.print_ms_stats(dataset, s_c_channels=s_c_channels)
spt.plot_interference(dataset)
Subsequently the dataset object is used to create a scanpy adata object.
adata = spt.dataset_to_scanpy(dataset)
Non-single cell channels have to be removed.
adata = adata[adata.obs['Channel'] != '126'].copy()
adata = adata[adata.obs['Channel'] != '127C'].copy()
Then the dataset can be normalized.
spt.normalize(adata)
The follwing functions are used to filter out outlier cells.
spt.calculate_cell_filter(adata)
spt.apply_cell_filter(adata)
To detect potential systematic bias introduced by the sample preparation or measurement the following functions can be used.
spt.plot_batch_qc(adata)
spt.plot_plate_qc(adata)
The adata object can now be used to perform a scanpy analysis.