hist2RNA: Predicting Gene Expression from Histopathology Images [Paper]
hist2RNA is an efficient deep learning-based project that aims to predict gene expression from breast cancer histopathology images. This project employs a efficient architecture to unlock underlying genetic expression in breast cancer.
We now support single-cell and spatial transcriptomics prediction with our new hist2scRNA model. This state-of-the-art extension uses Vision Transformers and Graph Neural Networks to predict spatially-resolved gene expression at single-cell resolution.
Key features of hist2scRNA:
- Vision Transformer (ViT) architecture for superior feature extraction
- Graph Neural Networks for spatial relationship modeling
- Zero-Inflated Negative Binomial (ZINB) loss for handling single-cell sparsity
- Multi-task learning with cell type prediction
- State-of-the-art performance based on GHIST, Hist2ST, and TransformerST
- A state-of-the-art deep learning model tailored for breast cancer histopathology images
- Efficient prediction of gene expression from histopathology images which means less training time
- User-friendly command-line interface
- Comprehensive documentation and tutorials
- Vision Transformer-based architecture for patch-level feature extraction
- Spatial graph attention for modeling cell-cell interactions
- Handles single-cell data sparsity with ZINB distribution
- Simultaneous gene expression and cell type prediction
- Compatible with 10X Visium and other spatial transcriptomics platforms
The following data sources have been used in this project:
- Genetic Data:
- Diagnostic Slide (DS): GDC Data Portal
- DS Download Guideline: Download TCGA Digital Pathology Images (FFPE)
- Python 3.9+
- Pytorch 2.0
-
Python implementation: Normalizing H&E Images or TorchStain
-
Actual Matlab implementation: Staining Normalization
-
Reference: Macenko et al. (2009) - A method for normalizing histology slides for quantitative analysis
-
Clone the repository:
git clone https://github.com/raktim-mondol/hist2RNA.git -
Change directory to the cloned repository:
cd hist2RNA -
Install the required packages:
pip install -r requirements.txt -
Train the model:
python training_main.py --slides_dir ./data/slides/ --epochs 50 --batch_size 12 --lr 0.001- Test the model:
python test_main.py --test_patient_id ./patient_details/test_patient_id.txt --checkpoint_file ./models/hist2RNA_model.pthFor most efficient way, use following code:
python step_1_feature_extraction.pyThen,
python step_2_model_training_.pyFor detailed usage instructions, please refer to the documentation.
The following results show predictions for the PAM50 genes from histopathology test datatest images:
It leverages the overall patterns of gene expression for each patient. This allows for a more holistic understanding of gene behavior across the population.
This analysis focuses on the expression patterns of each gene individually. This reveals the significant variability in gene expression among different patients, which can lead to lower correlation coefficients.
We welcome contributions to improve and expand the capabilities of hist2RNA! Please follow the contributing guidelines to get started.
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this code useful in your research, please consider citing:
@Article{cancers15092569,
AUTHOR = {Mondol, Raktim Kumar and Millar, Ewan K. A. and Graham, Peter H. and Browne, Lois and Sowmya, Arcot and Meijering, Erik},
TITLE = {hist2RNA: An Efficient Deep Learning Architecture to Predict Gene Expression from Breast Cancer Histopathology Images},
JOURNAL = {Cancers},
VOLUME = {15},
YEAR = {2023},
NUMBER = {9},
ARTICLE-NUMBER = {2569},
URL = {https://www.mdpi.com/2072-6694/15/9/2569},
ISSN = {2072-6694},
DOI = {10.3390/cancers15092569}
}


