Skip to content

RDKit converter inferring #4305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: develop
Choose a base branch
from

Conversation

cbouy
Copy link
Member

@cbouy cbouy commented Sep 29, 2023

Fixes part of #3996

Changes made in this Pull Request:

  • Refactored the RDKit converter code to move the inferring code in a separate RDKitInferring module. The bond order and charges inferer has been move to a MDAnalysisInferer dataclass in there.
  • Renamed NoImplicit parameter to implicit_hydrogens and added a separate inferer argument (defaults to MDAnalysisInferer(). Passing NoImplicit to any of the relevant functions will issue a warning and make the necessary arrangements to execute the code in a backwards-compatible way (i.e. implicit_hydrogens=not NoImplicit and if NoImplicit is False: inferer=None).
  • Added TemplateInferer that wraps around RDKit's AssignBondOrdersFromTemplate. There's an additional adjust_hydrogens parameter that when set to True allows one to assign bond orders from a template molecule with implicit hydrogens to an input molecule with explicit hydrogens (which won't work with the base AssignBondOrdersFromTemplate for charged molecules where the charged atom has a hydrogen). I originally had this code in ProLIF for dealing with PDBQT inputs, figured it would be worth here as well.
  • Added RDKit's rdDetermineBonds inferring wrapper as showcased here.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4305.org.readthedocs.build/en/4305/

@pep8speaks
Copy link

pep8speaks commented Sep 29, 2023

Hello @cbouy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 67:80: E501 line too long (82 > 79 characters)
Line 445:80: E501 line too long (80 > 79 characters)

Line 331:80: E501 line too long (104 > 79 characters)
Line 333:80: E501 line too long (104 > 79 characters)
Line 335:80: E501 line too long (104 > 79 characters)
Line 337:80: E501 line too long (104 > 79 characters)
Line 339:80: E501 line too long (104 > 79 characters)
Line 341:80: E501 line too long (104 > 79 characters)
Line 343:80: E501 line too long (104 > 79 characters)
Line 345:80: E501 line too long (104 > 79 characters)
Line 347:80: E501 line too long (104 > 79 characters)
Line 349:80: E501 line too long (104 > 79 characters)
Line 351:80: E501 line too long (104 > 79 characters)
Line 353:80: E501 line too long (104 > 79 characters)

Line 476:80: E501 line too long (80 > 79 characters)
Line 904:80: E501 line too long (85 > 79 characters)
Line 905:80: E501 line too long (84 > 79 characters)
Line 906:80: E501 line too long (80 > 79 characters)
Line 907:80: E501 line too long (88 > 79 characters)

Comment last updated at 2024-08-26 15:54:51 UTC

@github-actions
Copy link

github-actions bot commented Sep 29, 2023

Linter Bot Results:

Hi @cbouy! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ⚠️ Possible failure
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/10563005550/job/29262240571


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

@codecov
Copy link

codecov bot commented Sep 29, 2023

Codecov Report

Attention: Patch coverage is 98.07692% with 5 lines in your changes missing coverage. Please review.

Project coverage is 93.84%. Comparing base (b113fd5) to head (22f4b1e).

Files with missing lines Patch % Lines
package/MDAnalysis/converters/RDKit.py 92.50% 0 Missing and 3 partials ⚠️
package/MDAnalysis/converters/RDKitInferring.py 99.06% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4305      +/-   ##
===========================================
+ Coverage    93.62%   93.84%   +0.22%     
===========================================
  Files          177      178       +1     
  Lines        22001    22090      +89     
  Branches      3114     3128      +14     
===========================================
+ Hits         20599    20731     +132     
+ Misses         947      901      -46     
- Partials       455      458       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@orbeckst orbeckst added the hackathon part of a MDAnalysis coding event label Oct 10, 2023
@cbouy cbouy force-pushed the rdkit-converter-inferring branch 3 times, most recently from 8620085 to ba1714a Compare December 14, 2023 21:03
@cbouy cbouy force-pushed the rdkit-converter-inferring branch from 0009427 to 77e5b35 Compare December 16, 2023 16:37
@cbouy cbouy changed the title [WIP] Rdkit converter inferring RDKit converter inferring Dec 17, 2023
@cbouy cbouy marked this pull request as ready for review December 17, 2023 13:36


@dataclass(frozen=True)
class MDAnalysisInferer:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No code changes here apart from refactoring all the different bond order inferring functions under this class, a deprecation warning if specifying max_iter anywhere else than in __init__, and some code formatting with black

----------
template : rdkit.Chem.rdchem.Mol
Molecule containing the bond orders and charges.
adjust_hydrogens: bool, default = True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking that the default to True sounds reasonable to everyone? Otherwise if you have a charged mol you'll need to add explicit hydrogens on the template to have an exact match with your input mol which may be a bit of a pain.
This was originally in ProLIF to read PDBQT files as valid RDKit mols, happy to add it here (and should be fine license-wise)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think True makes sense, especially if this is just to make template matching possible.

However, can you document under which circumstances users should use False?


def __call__(self, mol: "Chem.Mol") -> "Chem.Mol":
new = Chem.Mol(mol)
DetermineBondOrders(new, charge=self.charge)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RDKit now also has a DetermineBonds which could be interesting to add as an alternative to guess_bonds? Or at least in the RDKitConverter which requires bonds anyway

@IAlibay
Copy link
Member

IAlibay commented Dec 18, 2023

Thanks @cbouy ! From a quick look this seems great, I'll try to review it at some point over the next week (unless someone gets to it first).

P.S. For others that might review here - codecov seems to be throwing a bunch of "uncovered code" messages (when they seem like they are). Cycling the PR might clear them, but I don't think it's a major necessity right now.

@orbeckst
Copy link
Member

@cbouy are you still working on the PR or is this ready for review?

@cbouy
Copy link
Member Author

cbouy commented Mar 30, 2024

Should be ready for review, I'll just need to update the changelog when ready for merging

@orbeckst
Copy link
Member

That's great.

Can you please add the CHANGELOG update right away, even if it will require resolving a merge conflict later? The summary there tends to be really helpful for assessing a PR.... and typically no reviewer will green-light such a PR without the CHANGELOG in place anyway.

@orbeckst
Copy link
Member

@richardjgowers do you have capacity to shepherd the PR to completion? If not please let me know and un-assign yourself. Thanks!

@cbouy cbouy force-pushed the rdkit-converter-inferring branch from 11f68ac to 3a85fde Compare April 1, 2024 17:48
@cbouy cbouy force-pushed the rdkit-converter-inferring branch 2 times, most recently from 2c09952 to ea26569 Compare April 1, 2024 18:48
@cbouy
Copy link
Member Author

cbouy commented Apr 1, 2024

Sorry for the spam, should be good now!

@orbeckst
Copy link
Member

@richardjgowers are you able to review this PR yourself or is there someone you could ping? From my very cursory glance, this looks pretty much ready and would be good to get in, given our roadmap towards "better chemistry".

@cbouy
Copy link
Member Author

cbouy commented Aug 26, 2024

Not sure why one of the azure test is timing out, or why the bot removed some of the tags but this is re-ready for review 😅

@IAlibay
Copy link
Member

IAlibay commented Aug 26, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@cbouy
Copy link
Member Author

cbouy commented Jul 2, 2025

necrobump 😅 Should I split this PR in the refactoring bit vs the additional "inferers" (from template mol and the one using rdkit's XYZ2MOL code) to get it moving?
I'm starting to receive additional user stories on the prolif side that could benefit from having custom inferers as an alternative to the hardcoded one.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a very superficial review – noting that a lot of the new code does not show up with test coverage, noting that this would go into 2.10.0 (and not 2.8.0...).

I can't say much about the actual inferrers – I am not a cheminformatics wizard. Generally the refactor and the new additions look sensible to me. Some docs would help to explain the motivation and how the new inferrers would be used.

I would also suggest to make the rdkit converter its own sub-package by creating a rdkitconverter directory and collecting RDKit.py and RDKitInferring.py there.

----------
template : rdkit.Chem.rdchem.Mol
Molecule containing the bond orders and charges.
adjust_hydrogens: bool, default = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think True makes sense, especially if this is just to make template matching possible.

However, can you document under which circumstances users should use False?

@cbouy
Copy link
Member Author

cbouy commented Jul 4, 2025

Forgot to run black, oopsie, will adjust.
@orbeckst the coverage is messed up for whatever reason (it was already the case before), not sure why, but I already had tests for most things you pointed out (but not all so thank you!)
One thing that just occurred to me as a non-native speaker: is the correct spelling inferer or inferrer? I think it's inferring so probably the latter, just checking before I make the change 😅

@orbeckst
Copy link
Member

orbeckst commented Jul 4, 2025

One thing that just occurred to me as a non-native speaker: is the correct spelling inferer or inferrer? I think it's inferring so probably the latter, just checking before I make the change

You're asking the wrong person 🇩🇪 ;-). My hunch is inferrer. Supported by https://en.wiktionary.org/wiki/inferrer

@cbouy cbouy force-pushed the rdkit-converter-inferring branch from 0caeeb4 to 8dd3df1 Compare July 4, 2025 17:07
@cbouy
Copy link
Member Author

cbouy commented Jul 5, 2025

ready for next review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement hackathon part of a MDAnalysis coding event
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants