Transfer Learning with Custom Empty Model + New Tokens #218
-
Hi, I was just wondering, how one approaches the use of the Maybe I am overlooking something? Best. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
Hi Janosch, the idea of standardization is to bring the input SMILES into, as it says, a standard format including validation for chemical validity according to RDKit's valence model. REINVENT will generate SMILES as trained in the model but then will determine validity using RDKit. So, you will have to ensure that the input SMILES are in a form acceptable to RDKit. Also, the assumption is made that scoring functions, in particular predictive models, are RDKit-SMILES based. We have a facility though to convert SMILES before passing them on e.g. the Lilly scoring components in In practice, you would need to ensure to read in either a pre-standardized SMILES (set Having said that, we have started to implement a new data pipeline. In the newest release of REINVENT we have a new command Cheers. |
Beta Was this translation helpful? Give feedback.
I guess you mean
preprocess.py
from the data pipeline. In this case you need to setstandardize = false
for both empty model generation and TL. The built-in default filter has its own set of rules e.g. your first SMILES has a boron. It aslo has a few quirky other rules.