-
Notifications
You must be signed in to change notification settings - Fork 2
MolGNet Featurizer #305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
MolGNet Featurizer #305
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds MolGNet featurizer functionality to the project by introducing two new standalone scripts for generating molecular embeddings from SMILES strings. The implementation is adapted from the DIPK GitHub repository.
Key changes:
- Implements a complete MolGNet model with graph attention mechanisms for molecular feature extraction
- Adds a ChemBERTa-based drug embedding generator as an alternative featurization method
- Provides command-line interfaces for both featurizers to process datasets
Reviewed Changes
Copilot reviewed 1 out of 3 changed files in this pull request and generated 14 comments.
| File | Description |
|---|---|
drevalpy/datasets/featurizer/create_molgnet_embeddings.py |
Implements MolGNet graph neural network model and extraction pipeline, including graph construction from SMILES, attention-based message passing layers, and embedding generation with checkpoint loading |
drevalpy/datasets/featurizer/create_chemberta_drug_embeddings.py |
Provides ChemBERTa-based embedding generation using pre-trained transformer model from HuggingFace |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| :return: GELU applied to ``bias + y``. | ||
| """ | ||
| x = bias + y | ||
| return x * 0.5 * (1.0 + torch.erf(x / 1.41421)) |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded value 1.41421 (sqrt(2)) appears multiple times in the code. This magic number should be defined as a named constant for clarity and maintainability:
SQRT_2 = math.sqrt(2) # or 1.41421Then use x / SQRT_2 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
| def __init__(self) -> None: | ||
| """Create a new Self_loop callable (no parameters).""" | ||
| pass |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Empty __init__ methods with only pass are unnecessary in Python. These can be safely removed as Python will use the default __init__ from the parent class.
| :param x: Input tensor. | ||
| :return: Activated tensor. | ||
| """ | ||
| return x * 0.5 * (1.0 + torch.erf(x / 1.41421)) |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded value 1.41421 (sqrt(2)) appears multiple times in the code. This magic number should be defined as a named constant for clarity and maintainability:
SQRT_2 = math.sqrt(2) # or 1.41421Then use x / SQRT_2 instead.
|
@PascalIversen I've opened a new pull request, #306, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <[email protected]>
JudithBernett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - add tests?
No description provided.