TemporalVAE: atlas-assisted temporal mapping of time-series single-cell transcriptomes during embryogenesis
Contact: Yuanhua Huang, Yijun Liu
Email: [email protected]
A user-oriented repo is at https://github.com/StatBiomed/TemporalVAE-release with more features to be added.
TemporalVAE is a deep generative model in a dual-objective setting to infer the biological time of cells from a compressed latent space. We demonstrated its scalability to millions of cells in the mouse development atlas and its high accuracy in atlas-based cell staging on mouse organogenesis across platforms and during human peri-implantation between in vivo and in vitro conditions. Furthermore, we showed that our atlas-based time predictor can effectively support RNA velocity modeling over short-time cell differentiation, including hematopoiesis and neuronal development.
- Latest Updates
- Installations
- [Reproduce the result in manuscript](#Reproduce the result in manuscript)
- v0.1 (May, 2024): Initial release.
- v0.2 (May, 2024)
To install TemporalVAE, python 3.10.9 is required and follow the instruction
- Install Miniconda3 if not already available.
- Clone this repository:
git clone https://github.com/StatBiomed/TemporalVAE
- Navigate to
TemporalVAE
directory:
cd TemporalVAE
-
(5-10 minutes)
- Create a conda environment (
TemporalVAE-V1.0
) with the required dependencies with two environment configuration files.env_necessary.yml
inclueds minimal essential dependencies andenv_all.yml
includes complete development environment. If you encounter any pcks version issues, please checkenv_all.yml
for more version information.
conda env create -f env_necessary.yml
- Install PyTorch correctly for your system, check your computer's configuration (OS, CUDA version, etc.) and download Pytorch from https://pytorch.org.
- Install
tensorboard
pip install tensorboard
- Create a conda environment (
-
Activate the
TemporalVAE
environment you just created:
conda activate TemporalVAE-V1.0
- Create
data
folder to save your data,results
folder to save training and prediction results,logs
folder to save detailed log files.
mkdir data
mkdir results
mkdir logs
Compare the TemporalVAE with baseline methods on three small datasets cited in Psupertime mansucript.
-
Preprocess three datasets by the code described in preprocess_data_fromPsupertimeManuscript.md.
-
Run the code of each benchmarking method.
- For example run the LR:
python demo/Fig2_TemproalVAE_against_benchmark_methods/exp2_LR_toyDataset.py
-
Run plotFig2_check_corr.py to generate Fig2.
- Preprocess the mouse atlas data and mouse stereo data by
python -u Fig3_mouse_data/preprocess_data_mouse_embryonic_development_combineData.py
python -u Fig3_mouse_data/preprocess_data_mouse_embryo_stereo.py
- Reproduce the result of Figure3.A&B and save results in folder results/230827_trainOn_mouse_embryonic_development_kFold_testOnYZdata0809
python -u Fig3_mouse_data/TemporalVAE_kFoldOn_mouseAtlas.py
--result_save_path=230827_trainOn_mouse_embryonic_development_kFold_testOnYZdata0809
--vae_param_file=supervise_vae_regressionclfdecoder_mouse_stereo
--file_path=/mouse_embryonic_development/preprocess_adata_JAX_dataset_combine_minGene100_minCell50_hvg1000
--time_standard_type=embryoneg5to5
--train_epoch_num=100 --kfold_test --train_whole_model
> logs/log.log
-
Plot Figure3.A&B with the result in results/230827_trainOn_mouse_embryonic_development_kFold_testOnYZdata0809, please check Fig3_mouse_data/plot_figure3AB.ipynb
-
Figure3.C: Compare TemporalVAE with LR, PCA, RF on mouse atlas data, please check Fig3_mouse_data/LR_PCA_RF_kFoldOn_mouseAtlas.ipynb
-
Figure3.D&E: Models train on mouse atlas data and predict on mouse stereo-seq data, please check Fig3_mouse_data/TemporalVAE_LR_PCA_RF_directlyPredictOn_mouseStereo.ipynb or run code Fig3_mouse_data/TemporalVAE_LR_PCA_RF_directlyPredictOn_mouseStereo.py on console.
- Download original data of eight published human datasets (See details in Supplementary file). Integrate the raw dataset by
python -u Fig4_human_data/integration_humanEmbryo_Z_C_Xiao_M_P_Liu_Tyser_Xiang.py
- Figure 4.A-c: Performance of TemporalVAE by training on six training datasets and test on two hold-out test dataset by
python -u Fig4_human_data/TemporalVAE_humanEmbryo_ref6Dataset_queryOnXiang_Tyser.py
- Sfig: K-fold test on xiang19 dataset by:
python -u Fig4_human_data/TemporalVAE_humanEmbryo_kFoldOn_xiang19.py
- Preprocess Marmoset and Cynomolgus data by
python -u Fig5_crossSpecies/preprocess_data_marmoset_inVivo.py
python -u Fig5_crossSpecies/preprocess_data_Cyno.py
- Figure5.A-D: Performance of TemporalVAE on cross species prediction by
python -u Fig5_crossSpecies/TemporalVAE_crossSpecies_referenceMelania_queryOnCynoAndMarmoset.py
Identification of temporally sensitive genes by in silico perturbation.Here, we focus on the mouse embryo atlas as a showcase, thanks to its data consistency and broader time range.
python -u Fig6_identify_keyGenes/TemporalVAE_identify_keyGenes_mouseAtlas.py
python -u Fig6_identify_keyGenes/plot_perturbution_results.py
- The data is from paper .
- 1 Figure 5. C&E is the data of hematopoiesis cells, please check Fig5_RNA_velocity/VAE_mouse_fineTune_Train_on_U_pairs_S_hematopoiesis.ipynb or run code on console:
python -u Fig5_RNA_velocity/TemporalVAE_mouse_fineTune_Train_on_U_pairs_S.py --sc_file_name=240108mouse_embryogenesis/hematopoiesis --clf_weight=0.2
- 2 Figure 5. D&F is the data of neuron cells, please check Fig5_RNA_velocity/VAE_mouse_fineTune_Train_on_U_pairs_S_neuron.ipynb or run code on console:
python -u Fig5_RNA_velocity/TemporalVAE_mouse_fineTune_Train_on_U_pairs_S.py --sc_file_name=240108mouse_embryogenesis/neuron --clf_weight=0.1
- The scVelo result in Figure 5. E&F is base on the .ipynb code provided by the dataset's paper, please check Fig5_RNA_velocity/scVelo_hematopoiesis.ipynb and Fig5_RNA_velocity/scVelo_neuron.ipynb //: # (Build a well-structured software packages)