RDKit converter inferring #4305

cbouy · 2023-09-29T15:41:40Z

Fixes part of #3996

Changes made in this Pull Request:

Refactored the RDKit converter code to move the inferring code in a separate RDKitInferring module. The bond order and charges inferer has been move to a MDAnalysisInferer dataclass in there.
Renamed NoImplicit parameter to implicit_hydrogens and added a separate inferer argument (defaults to MDAnalysisInferer(). Passing NoImplicit to any of the relevant functions will issue a warning and make the necessary arrangements to execute the code in a backwards-compatible way (i.e. implicit_hydrogens=not NoImplicit and if NoImplicit is False: inferer=None).
Added TemplateInferer that wraps around RDKit's AssignBondOrdersFromTemplate. There's an additional adjust_hydrogens parameter that when set to True allows one to assign bond orders from a template molecule with implicit hydrogens to an input molecule with explicit hydrogens (which won't work with the base AssignBondOrdersFromTemplate for charged molecules where the charged atom has a hydrogen). I originally had this code in ProLIF for dealing with PDBQT inputs, figured it would be worth here as well.
Added RDKit's rdDetermineBonds inferring wrapper as showcased here.

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

Developers certificate of origin

I certify that this contribution is covered by the LGPLv2.1+ license as defined in our LICENSE and adheres to the Developer Certificate of Origin.

📚 Documentation preview 📚: https://mdanalysis--4305.org.readthedocs.build/en/4305/

pep8speaks · 2023-09-29T15:41:48Z

Hello @cbouy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file package/MDAnalysis/converters/RDKit.py:

Line 67:80: E501 line too long (82 > 79 characters)
Line 445:80: E501 line too long (80 > 79 characters)

In the file package/MDAnalysis/converters/RDKitInferring.py:

Line 331:80: E501 line too long (104 > 79 characters)
Line 333:80: E501 line too long (104 > 79 characters)
Line 335:80: E501 line too long (104 > 79 characters)
Line 337:80: E501 line too long (104 > 79 characters)
Line 339:80: E501 line too long (104 > 79 characters)
Line 341:80: E501 line too long (104 > 79 characters)
Line 343:80: E501 line too long (104 > 79 characters)
Line 345:80: E501 line too long (104 > 79 characters)
Line 347:80: E501 line too long (104 > 79 characters)
Line 349:80: E501 line too long (104 > 79 characters)
Line 351:80: E501 line too long (104 > 79 characters)
Line 353:80: E501 line too long (104 > 79 characters)

In the file testsuite/MDAnalysisTests/converters/test_rdkit.py:

Line 476:80: E501 line too long (80 > 79 characters)
Line 904:80: E501 line too long (85 > 79 characters)
Line 905:80: E501 line too long (84 > 79 characters)
Line 906:80: E501 line too long (80 > 79 characters)
Line 907:80: E501 line too long (88 > 79 characters)

Comment last updated at 2024-08-26 15:54:51 UTC

github-actions · 2023-09-29T15:43:33Z

Linter Bot Results:

Hi @cbouy! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location	Outcome
main package	⚠️ Possible failure
testsuite	⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/10563005550/job/29262240571

Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

codecov · 2023-09-29T16:01:24Z

Codecov Report

Attention: Patch coverage is 98.07692% with 5 lines in your changes missing coverage. Please review.

Project coverage is 93.84%. Comparing base (b113fd5) to head (22f4b1e).

Files with missing lines	Patch %	Lines
package/MDAnalysis/converters/RDKit.py	92.50%	0 Missing and 3 partials ⚠️
package/MDAnalysis/converters/RDKitInferring.py	99.06%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4305      +/-   ##
===========================================
+ Coverage    93.62%   93.84%   +0.22%     
===========================================
  Files          177      178       +1     
  Lines        22001    22090      +89     
  Branches      3114     3128      +14     
===========================================
+ Hits         20599    20731     +132     
+ Misses         947      901      -46     
- Partials       455      458       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

package/pyproject.toml

package/MDAnalysis/converters/RDKitInferring.py

cbouy · 2023-12-18T13:01:27Z

package/MDAnalysis/converters/RDKitInferring.py

+
+
+@dataclass(frozen=True)
+class MDAnalysisInferer:


No code changes here apart from refactoring all the different bond order inferring functions under this class, a deprecation warning if specifying max_iter anywhere else than in __init__, and some code formatting with black

cbouy · 2023-12-18T13:05:46Z

package/MDAnalysis/converters/RDKitInferring.py

+    ----------
+    template : rdkit.Chem.rdchem.Mol
+        Molecule containing the bond orders and charges.
+    adjust_hydrogens: bool, default = True


Just checking that the default to True sounds reasonable to everyone? Otherwise if you have a charged mol you'll need to add explicit hydrogens on the template to have an exact match with your input mol which may be a bit of a pain.
This was originally in ProLIF to read PDBQT files as valid RDKit mols, happy to add it here (and should be fine license-wise)

I think True makes sense, especially if this is just to make template matching possible.

However, can you document under which circumstances users should use False?

cbouy · 2023-12-18T13:08:59Z

package/MDAnalysis/converters/RDKitInferring.py

+
+    def __call__(self, mol: "Chem.Mol") -> "Chem.Mol":
+        new = Chem.Mol(mol)
+        DetermineBondOrders(new, charge=self.charge)


RDKit now also has a DetermineBonds which could be interesting to add as an alternative to guess_bonds? Or at least in the RDKitConverter which requires bonds anyway

IAlibay · 2023-12-18T14:08:35Z

Thanks @cbouy ! From a quick look this seems great, I'll try to review it at some point over the next week (unless someone gets to it first).

P.S. For others that might review here - codecov seems to be throwing a bunch of "uncovered code" messages (when they seem like they are). Cycling the PR might clear them, but I don't think it's a major necessity right now.

orbeckst · 2024-03-29T21:48:38Z

@cbouy are you still working on the PR or is this ready for review?

cbouy · 2024-03-30T16:52:41Z

Should be ready for review, I'll just need to update the changelog when ready for merging

orbeckst · 2024-03-30T17:05:43Z

That's great.

Can you please add the CHANGELOG update right away, even if it will require resolving a merge conflict later? The summary there tends to be really helpful for assessing a PR.... and typically no reviewer will green-light such a PR without the CHANGELOG in place anyway.

orbeckst · 2024-03-30T17:09:07Z

@richardjgowers do you have capacity to shepherd the PR to completion? If not please let me know and un-assign yourself. Thanks!

package/pyproject.toml

cbouy · 2024-04-01T19:03:55Z

Sorry for the spam, should be good now!

orbeckst · 2024-06-11T04:34:28Z

@richardjgowers are you able to review this PR yourself or is there someone you could ping? From my very cursory glance, this looks pretty much ready and would be good to get in, given our roadmap towards "better chemistry".

…kwards compatibility

cbouy · 2024-08-26T18:26:31Z

Not sure why one of the azure test is timing out, or why the bot removed some of the tags but this is re-ready for review 😅

IAlibay · 2024-08-26T19:10:12Z

/azp run

azure-pipelines · 2024-08-26T19:10:21Z

Azure Pipelines successfully started running 1 pipeline(s).

cbouy · 2025-07-02T20:15:24Z

necrobump 😅 Should I split this PR in the refactoring bit vs the additional "inferers" (from template mol and the one using rdkit's XYZ2MOL code) to get it moving?
I'm starting to receive additional user stories on the prolif side that could benefit from having custom inferers as an alternative to the hardcoded one.

orbeckst

I did a very superficial review – noting that a lot of the new code does not show up with test coverage, noting that this would go into 2.10.0 (and not 2.8.0...).

I can't say much about the actual inferrers – I am not a cheminformatics wizard. Generally the refactor and the new additions look sensible to me. Some docs would help to explain the motivation and how the new inferrers would be used.

I would also suggest to make the rdkit converter its own sub-package by creating a rdkitconverter directory and collecting RDKit.py and RDKitInferring.py there.

package/CHANGELOG

package/MDAnalysis/converters/RDKit.py

orbeckst · 2025-07-02T22:33:24Z

package/MDAnalysis/converters/RDKitInferring.py

+    ----------
+    template : rdkit.Chem.rdchem.Mol
+        Molecule containing the bond orders and charges.
+    adjust_hydrogens: bool, default = True


I think True makes sense, especially if this is just to make template matching possible.

However, can you document under which circumstances users should use False?

package/MDAnalysis/converters/RDKitInferring.py

cbouy · 2025-07-04T13:06:58Z

Forgot to run black, oopsie, will adjust.
@orbeckst the coverage is messed up for whatever reason (it was already the case before), not sure why, but I already had tests for most things you pointed out (but not all so thank you!)
One thing that just occurred to me as a non-native speaker: is the correct spelling inferer or inferrer? I think it's inferring so probably the latter, just checking before I make the change 😅

orbeckst · 2025-07-04T16:04:20Z

One thing that just occurred to me as a non-native speaker: is the correct spelling inferer or inferrer? I think it's inferring so probably the latter, just checking before I make the change

You're asking the wrong person 🇩🇪 ;-). My hunch is inferrer. Supported by https://en.wiktionary.org/wiki/inferrer

cbouy · 2025-07-05T00:08:09Z

ready for next review

github-actions bot added the Component-Converters label Sep 29, 2023

orbeckst added the hackathon part of a MDAnalysis coding event label Oct 10, 2023

IAlibay added the enhancement label Nov 5, 2023

cbouy force-pushed the rdkit-converter-inferring branch 3 times, most recently from 8620085 to ba1714a Compare December 14, 2023 21:03

cbouy mentioned this pull request Dec 15, 2023

RDKitConverter improvements #3996

Open

github-actions bot added the Continuous Integration label Dec 16, 2023

cbouy force-pushed the rdkit-converter-inferring branch from 0009427 to 77e5b35 Compare December 16, 2023 16:37

github-actions bot added Continuous Integration and removed Continuous Integration labels Dec 16, 2023

cbouy changed the title ~~[WIP] Rdkit converter inferring~~ RDKit converter inferring Dec 17, 2023

cbouy marked this pull request as ready for review December 17, 2023 13:36

cbouy commented Dec 18, 2023

View reviewed changes

orbeckst assigned richardjgowers Mar 30, 2024

cbouy force-pushed the rdkit-converter-inferring branch from 11f68ac to 3a85fde Compare April 1, 2024 17:48

IAlibay reviewed Apr 1, 2024

View reviewed changes

package/pyproject.toml Show resolved Hide resolved

cbouy force-pushed the rdkit-converter-inferring branch 2 times, most recently from 2c09952 to ea26569 Compare April 1, 2024 18:48

cbouy mentioned this pull request Apr 15, 2024

Fixing ligand connectivity on the fly chemosim-lab/ProLIF#202

Open

cbouy and others added 8 commits August 23, 2024 10:22

improve test coverage

7e76f73

more nitpicks

189ed17

add rdDetermineBonds inferer

573a8d1

address comments and linting

e987fdb

revert auto-formatting

5cbc9d0

expose STANDARDIZATION_REACTIONS and MONATOMIC_CATION_CHARGES for bac…

cf4c38d

…kwards compatibility

make sanitization optional

a5caf76

fix rdkit support for numpy v2

7adaa02

cbouy force-pushed the rdkit-converter-inferring branch from 6070d89 to 7adaa02 Compare August 23, 2024 09:43

github-actions bot removed Continuous Integration Component-Converters labels Aug 23, 2024

cbouy added 4 commits August 24, 2024 18:47

fix tests

5ff8f47

document usage of rdkit inferers

3a05f2b

fix remaining tests rdkit incompatible with numpy 2

454c0d5

Merge branch 'develop' into rdkit-converter-inferring

31964b4

orbeckst requested changes Jul 2, 2025

View reviewed changes

cbouy and others added 3 commits July 3, 2025 20:07

Merge branch 'develop' into rdkit-converter-inferring

ac4a6c0

chore: formatting/linting

12d7ca4

address comments: bump to 2.10.0, fix docs, formatting, missing tests

076e06f

cbouy added 2 commits July 4, 2025 18:44

chore: rename to inferrer

add0def

formatting

8dd3df1

cbouy force-pushed the rdkit-converter-inferring branch from 0caeeb4 to 8dd3df1 Compare July 4, 2025 17:07

fix failing tests

22f4b1e



		@dataclass(frozen=True)
		class MDAnalysisInferer:

RDKit converter inferring #4305

Are you sure you want to change the base?

RDKit converter inferring #4305

Uh oh!

Conversation

cbouy commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Developers certificate of origin

Uh oh!

pep8speaks commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2024-08-26 15:54:51 UTC

Uh oh!

github-actions bot commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linter Bot Results:

Uh oh!

codecov bot commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

cbouy Dec 18, 2023

Choose a reason for hiding this comment

Uh oh!

cbouy Dec 18, 2023

Choose a reason for hiding this comment

Uh oh!

orbeckst Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

cbouy Dec 18, 2023

Choose a reason for hiding this comment

Uh oh!

IAlibay commented Dec 18, 2023

Uh oh!

orbeckst commented Mar 29, 2024

Uh oh!

cbouy commented Mar 30, 2024

Uh oh!

orbeckst commented Mar 30, 2024

Uh oh!

orbeckst commented Mar 30, 2024

Uh oh!

Uh oh!

cbouy commented Apr 1, 2024

Uh oh!

orbeckst commented Jun 11, 2024

Uh oh!

cbouy commented Aug 26, 2024

Uh oh!

IAlibay commented Aug 26, 2024

Uh oh!

azure-pipelines bot commented Aug 26, 2024

Uh oh!

cbouy commented Jul 2, 2025

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

orbeckst Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cbouy commented Jul 4, 2025

Uh oh!

orbeckst commented Jul 4, 2025

Uh oh!

cbouy commented Jul 5, 2025

Uh oh!

cbouy commented Sep 29, 2023 •

edited

Loading

pep8speaks commented Sep 29, 2023 •

edited

Loading

github-actions bot commented Sep 29, 2023 •

edited

Loading

codecov bot commented Sep 29, 2023 •

edited

Loading