Skip to content
/ DeepBDE Public

A deep learning model developed using large and accurate dataset generated via atom-centered potentials approach

Notifications You must be signed in to change notification settings

MSRG/DeepBDE

Repository files navigation

DeepBDE: a graph neural network for fast and accurate bond dissociation enthalpies

This repository contains the official implementation of DeepBDE, available on arXiv.

Getting Started

  1. Setup environment

    Create an environment with all necessary dependencies. This can be done using Conda:

    conda create -n "deepbde" python=3.12
    conda activate deepbde
    pip install -r requirements.txt

    DGL needs to be istalled separately as our repo expects CUDA to be available with it.

    pip install  dgl -f https://data.dgl.ai/wheels/torch-2.3/repo.html
  2. Download model and transforms (or dataset CSV if training)

To run predictions

  1. Extract model and transform

    Place these files in the parent directory of this repo.

  2. Inferencing

    Single reaction inferencing - requires reactant SMILES and bond index. Bond index is defined by how RDkit arranges bond order in the molecule.

    Example: split reactant given by SMILES: CCOc1cccc(O)c1 at bond index 1. Products are: [O]c1cccc(O)c1 [CH2]C

    python infer.py 'CCOc1cccc(O)c1' 1

    Same as above, but use products to cross-check that the reaction products are the same as you expect (if not, an error is generated).

    python infer.py 'CCOc1cccc(O)c1' 1 --product_1_smiles '[O]c1cccc(O)c1' --product_2_smiles '[CH2]C'

    Multiple reaction inferencing - requires reactant SMILES and bond indices. Bond index is defined by how RDkit arranges bond order in the molecule. If the list has a single index, it is identical to single reaction inference. The BDEs will be given in the same order as the bond indices inputted.

    python multi_infer.py 'C[C@H](O)C(=O)O' '[4,5,9]'

    All valid bond inferencing - requires reactant SMILES only. All valid bond indices will be found and printed before BDE values are outputted.

    python infer_all.py 'C[C@H](O)C(=O)O'

Training

  1. Encoding dataset and subset split

    Create the encoded dataset that can be used with our training code from a CSV file (find download above). Also creates a train, validation, test index split that will be required. We use a typical 8:1:1 split as an example.

    python encode_dataset.py --save_dir [save directory] --csv_path [path to dataset csv] --split '[0.8,0.1,0.1]'
  2. Train model given hyperparameters

    Train model given a set of hyperparameters. This code supports training restart - if the path has a pre-existing train save and remaining arguments to the function are the same, training will resume from the last epoch recorded.

    We show the hyperparameters used in our final model here. The dset_path should point to the dset/ dir when the encoded dataset is generated. training_indices_path and valid_indices_path should point to subset index files generated by above.

    python train.py \
        --path [save path for training] \
        --dset_path [path to the dset directory] \
        --train_indices_path [path training indices list file generated] \
        --valid_indices_path [path validation indices list file generated] \
        --device [cpu or cuda] \
        --num_workers 1 \
        --activation_fn 'silu' \
        \
        --graph_hidden_size 256 \
        --graph_inner_layer_sizes '[[256, 256, 256, 256, 256, 256], [256, 256, 256, 256, 256, 256], [256, 256, 256, 256, 256, 256], [256, 256, 256, 256, 256, 256], [256, 256, 256, 256, 256, 256]]' \
        --fc_readout_sizes '[128, 32, 32, 32, 32, 32, 32, 32]' \
        \
        --learn_rate 0.00011796198660219 \
        --epochs 1000 \
        --batch_size 512 \
        \
        --reducelr_factor 0.8 \
        --reducelr_patience 20 \
        --reducelr_threshold 0.01 \
        \
        --min_epochs 1000 \
        --epochs_of_no_mae_drop_before_stop 1000

Citation

arxiv or nature bib whatever

About

A deep learning model developed using large and accurate dataset generated via atom-centered potentials approach

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •