Bioactivity Prediction by a Heterogeneous Siamese Neural Network (SNN)

Group 2 Machine Learning in Medicine - Year 3 Data Science

Pham Hai Nam (BI12-307)
Nguyen Son (BI12-389)
Le Mau Minh Phuc (BI12-351)
Doan Huu Thanh (BI12-418)

Abstract

The determination of bioactivity during medication leads to challenges that can cause drug failure. This study presents a model, BioAct-Het, which utilizes a Siamese Neural Network (SNN) to predict bioactivity classes for both COVID-19 and non-COVID-19 drugs. The model demonstrates an accuracy greater than 75%, making it a reliable tool for evaluating side effects and enhancing drug discovery.

Introduction

Context of Computational Methods in Drug Discovery

The process of drug discovery is lengthy and costly, often taking over a decade and requiring billions of dollars in investment. Predicting a compound's bioactivities—both positive effects and potential side effects—is crucial for successful drug development. Computational methods, particularly those that explore structure-activity relationships (SAR), provide an efficient alternative to traditional laboratory screening.

Objectives

Evaluate the BioAct-Het model using predicted and actual side effect data from a set of known drugs.
Identify bioactivity classes for 24 marketed drugs, including both COVID-19 and non-COVID-19 treatments.

Methodology

Data Materials

Four distinct datasets were utilized in this study:

Manual Collection Dataset: This dataset consists of 24 marketed drugs specifically used for treating COVID-19. The drugs were selected based on their documented bioactivities and side effects.
SIDER (Side Effect Resource): SIDER provides information about the side effects of 1,427 marketed drugs, categorized into 27 classes based on the Medical Dictionary for Regulatory Activities (MedDRA). This dataset contains over 21,000 samples of side effects.
Tox21: This dataset includes 7,831 chemical substances evaluated through 12 toxicological assays. It is designed to assess potential risks associated with chemical exposures, aiding in the prediction of toxicity.
MUV (Maximum Unbiased Validation): MUV serves as a benchmark for evaluating virtual screening methods. It includes over 93,000 compounds and is structured to minimize the risk of overfitting by presenting 17 challenging tasks.

Model Architecture

The BioAct-Het model is built on a heterogeneous Siamese Neural Network architecture, which consists of two branches:

Chemical Representation Branch: Utilizes Graph Convolutional Networks (GCN) to transform SMILES representations of chemical structures into vector embeddings.
Bioactivity Class Representation Branch: Employs a binary string encoding (Morgan fingerprint) to represent bioactivity classes based on known active and inactive substructures.

Evaluation Metrics

To assess the model's performance, several evaluation metrics were employed:

Accuracy: Measures the overall correctness of the model's predictions.
Precision: Evaluates the ratio of true positive predictions to the total predicted positives.
Recall: Measures the ratio of true positive predictions to the total actual positives.
F1 Score: Provides a balance between precision and recall, especially useful for imbalanced datasets.

Results

Model Evaluation

The BioAct-Het model was evaluated using a test set of seven drugs, comparing the predicted side effects with actual documented side effects from the SIDER dataset. The drugs included:

Chloroquine
Famotidine
Guanfacine
Hydroxychloroquine
Oseltamivir
Prednisolone
Ritonavir

The model achieved an overall accuracy of 87.76%, with individual metrics as follows:

Drug Discovery Analysis

From the analysis, five drugs were identified with the fewest predicted side effects:

Paxlovid (Nirmatrelvir): 12 predicted side effects.
Linagliptin: 11 predicted side effects.
Interferon-beta-1a: 11 predicted side effects.
Famotidine: 10 predicted side effects.
Hydroxychloroquine: 10 predicted side effects.

Notably, Paxlovid and Interferon-beta-1a are both relevant to COVID-19 treatment, suggesting their potential use in clinical settings. Conversely, the model also identified drugs with the highest predicted side effects, indicating the need for careful monitoring:

Ritonavir: 25 predicted side effects.
Ivermectin-B1a: 26 predicted side effects.
Cyclosporine: 26 predicted side effects.

Visualization

Confusion matrices and bubble charts were generated to visualize the model's prediction performance across the tested drugs. The confusion matrices indicated that the model accurately predicted both the presence and absence of side effects for most drug-side effect pairs, highlighting the model's reliability in clinical applications.

They can be found here for the Bubble Charts and the Confusion Matrices

Conclusion

This study successfully demonstrates the application of the BioAct-Het model in predicting bioactivity classes and side effects of marketed drugs. The findings underscore the model's potential utility in drug discovery, particularly in identifying safe options for COVID-19 treatment. Further research is recommended to validate these findings through in vitro and in vivo studies.

For more details, please refer to the full report or the presentation slide.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Data		Data
Figures		Figures
Models		Models
Output		Output
Report_Slide		Report_Slide
Utils		Utils
(MUV) Association_based_strategy.ipynb		(MUV) Association_based_strategy.ipynb
(MUV) Bioactivity_class_based_strategy.ipynb		(MUV) Bioactivity_class_based_strategy.ipynb
(MUV) Drug_based_strategy.ipynb		(MUV) Drug_based_strategy.ipynb
(Sider) Association_based_strategy.ipynb		(Sider) Association_based_strategy.ipynb
(Sider) Bioactivity_class_based_strategy.ipynb		(Sider) Bioactivity_class_based_strategy.ipynb
(Sider) Drug_based_strategy.ipynb		(Sider) Drug_based_strategy.ipynb
(Tox21) Association_based_strategy.ipynb		(Tox21) Association_based_strategy.ipynb
(Tox21) Bioactivity_class_based_strategy.ipynb		(Tox21) Bioactivity_class_based_strategy.ipynb
(Tox21) Drug_based_strategy.ipynb		(Tox21) Drug_based_strategy.ipynb
GCN_attentivefp_SIDER_pre_trained.pth		GCN_attentivefp_SIDER_pre_trained.pth
GCN_attentivefp_Tox21_pre_trained.pth		GCN_attentivefp_Tox21_pre_trained.pth
README.md		README.md
Transfer_Learning_&_Case_Study.ipynb		Transfer_Learning_&_Case_Study.ipynb
eva.py		eva.py
eva_acc.py		eva_acc.py
eva_cf.ipynb		eva_cf.ipynb
eva_cf.py		eva_cf.py
eva_more.py		eva_more.py
sider_dglgraph.bin		sider_dglgraph.bin
tox21_dglgraph.bin		tox21_dglgraph.bin
transfer_learning_&_case_study.py		transfer_learning_&_case_study.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bioactivity Prediction by a Heterogeneous Siamese Neural Network (SNN)

Group 2 Machine Learning in Medicine - Year 3 Data Science

Abstract

Introduction

Context of Computational Methods in Drug Discovery

Objectives

Methodology

Data Materials

Model Architecture

Evaluation Metrics

Results

Model Evaluation

Drug Discovery Analysis

Visualization

Conclusion

About

Languages

Xernnn/Bioact-Pred-Het-SNN

Folders and files

Latest commit

History

Repository files navigation

Bioactivity Prediction by a Heterogeneous Siamese Neural Network (SNN)

Group 2 Machine Learning in Medicine - Year 3 Data Science

Abstract

Introduction

Context of Computational Methods in Drug Discovery

Objectives

Methodology

Data Materials

Model Architecture

Evaluation Metrics

Results

Model Evaluation

Drug Discovery Analysis

Visualization

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Languages