-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Details of the affnity labels in data #26
Comments
We train with an L2 loss on "good" low RMSD (<2A) poses and a hinge loss on "bad" poses (as the predicted affinity should not be greater than the true affinity in this case, but it isn't reasonable to expect the correct affinity from a bad pose). The negative value is used by our AffinityLoss function to identify that the hinge loss should be applied. |
I am confused about the following statement about the dataset setting in the Paper From my view, the |
I now know that you use |
In my opinion, the importance of using hinge loss for docked_poses with RMSD>2A is its ability to avoid over-penalty for bad samples. Though I didn't find this in the paper. by the way, if you only want to train the model to predict affinity, do you have to generate counter-examples? As the paper says, the counter-example is (CNNscore>0.9 but RMSD >2; CNNscore<0.5 but RMSD <2), it has nothing to do with CNNaffinity. So I deem that you don't have to generate counter-ezamples if you only want to train on thoses docked poses to predict affinity. @JonasLi-19 |
For training and evaluation in this paper, all poses are already docked - we do not use trained models to perform docking, only rescoring. The hinge loss is only for affinity prediction because a bad pose should not have a good affinity. Counter examples as described here are for training the pose scoring model. |
Hi Professor @dkoes , I couldn't figure out why some of the crystal ligands are negative in pk (as you said the bad poses are negative, but why crystal poses could also be negative?) I couldn't find the explanation in the paper, so I have to ask you here. @SanFran-Me Thank you for your opinion, I agree that I don't have to care about generating the counterexamples if I just care about the CNNaffinity accuracy performance. |
That's a good question - @francoep ? |
Bug from file generation would be my guess. By definition they should all be 1 and non-negative |
I uploaded a fixed version of the types file. I also re-ran the training of the Default 2018 model on the PDBbind crystal data, test on the core set. Paper reported Def2018 model RMSE - 1.500325, Pearson R - 0.734269 (Table 3) This minor bump in performance does not change the general results reported in the CrossDocked2020 paper, which is that training on the general set with docked poses was better than training on the refined set's crystal poses. |
Good for you to find out bugs and improve your models, but how could I use your newly generated |
The model file is unchanged (it only describes the network setup, which is just the default2018 architecture), and when training you generate your own weights files. This setup of train on pdbbind refined crystals is not a part of gnina at all (and I would recommend against using such a training setup at all) |
when use *_min poses as part of the training set
I noticed that in *.types files in PDBbind2016, you use *_min poses as part of the train data. Then how do you define their affinity label? Did you just assign those minimized poses the same affinity with the crystal poses? And other docked poses just set to the corresponding negative number?
Why the second column has positive and negetive nubers for ligand and docked_poses?
<label> <pK> <RMSD to crystal> <Receptor filename> <Ligand filename> # <Autodock Vina score>>
1 3.28 0.908077 3zsx/3zsx_rec_0.gninatypes 3zsx/3zsx_min_0.gninatypes # -6.89469
0 -3.28 4.7514 3zsx/3zsx_rec_0.gninatypes 3zsx/3zsx_docked_0.gninatypes # -7.84082
0 -3.28 3.89599 3zsx/3zsx_rec_0.gninatypes 3zsx/3zsx_docked_1.gninatypes # -7.43202
0 -3.28 6.06622 3zsx/3zsx_rec_0.gninatypes 3zsx/3zsx_docked_2.gninatypes # -7.10783
0 -3.28 7.9518 3zsx/3zsx_rec_0.gninatypes 3zsx/3zsx_docked_3.gninatypes # -7.03943
The text was updated successfully, but these errors were encountered: