Benchmarking Genetic Perturbation Prediction Models
for Transcriptional Response

Description

Exploring the molecular responses of single cells to various influences—such as external stimuli or gene knockouts—is a crucial step toward demystifying the intricacies of cellular mechanisms. Although single-cell perturbation remains experimentally challenging at the current stage, burgeoning bioinformatic tools are accelerating the expansion of this field through in silico modeling. More strikingly, the recent surge in foundation models underscores the applicability of perturbation tasks as an indispensable downstream application. Therefore, a critical assessment of reliable perturbation tools is required. Here, we benchmark the performance of leading algorithms, including four conventional approaches and eight foundation models, using 21 paired perturbed or non-perturbed cell datasets of varying perturbation types and qualities. The results show that the conventional deep learning method GEARS and the foundation model scGPT outperform others in a comprehensive benchmark. Additionally, in certain tasks, the foundation models demonstrate promising potential through pre-training strategies or by ensembling with GEARS's perturbation embeddings. Our findings also highlight dataset quality, measured by E-distance, as a critical determinant of model performance. This study offers actionable insights for choosing the most suitable toolkit based on dataset characteristics and informs future development of robust, generalizable models for genetic perturbation predictions.

Implementation

We implemented all 12 proposed methods using their default parameters as described in their respective publications. In the /methods folder, we provide the implementations of all methods on the /demo_data dataset as an example. For detailed information regarding each method, please refer to the following repositories: GRN, CPA, GEARS, AttentionPert, scLong, scGPT, scFoundation, scELMo, scBERT-G, Geneformer-G, GenePT-G and ESM2-G.

The /analysis_visualization folder contains scripts for generating the visualizations and quantitative analyses presented in our manuscript and supplementary materials. To test any of the models on the /demo_data dataset, you can download the provided data, update the path to pert_data in the corresponding scripts within the /methods folder, and execute the code.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
analysis_visualization		analysis_visualization
demo_data		demo_data
methods		methods
pert_env		pert_env
.gitignore		.gitignore
Figure1.svg		Figure1.svg
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking Genetic Perturbation Prediction Models
for Transcriptional Response

Description

Implementation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Genetic Perturbation Prediction Models for Transcriptional Response

Description

Implementation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Benchmarking Genetic Perturbation Prediction Models
for Transcriptional Response

Packages