fix inclusion of PRO in secondary structure #5065

orbeckst · 2025-06-12T00:22:19Z

Changes made in this Pull Request:

ported fix from PyDSSP 0.9.1 by @ShintaroMinami to analysis.dssp.DSSP (see also Wrong assignment or prolines? ShintaroMinami/PyDSSP#2)
new kwarg ignore_proline_donor=True for DSSP (the new default changes the behavior and implements the fix, False recovers old behavior); the kwarg also exists in PyDSSP
updated docs
minimal regression tests
updated CHANGELOG

PR Checklist

Issue raised/referenced?
Tests updated/added?
Documentation updated/added?
package/CHANGELOG file updated?
Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.

📚 Documentation preview 📚: https://mdanalysis--5065.org.readthedocs.build/en/5065/

@ShintaroMinami

- fix #4913 - ported fix from PyDSSP 0.9.1 by @ShintaroMinami to analysis.dssp.DSSP (see also ShintaroMinami/PyDSSP#2) - new kwarg ignore_proline_donor=True for DSSP (the new default changes the behavior and implements the fix, False recovers old behavior); the kwarg also exists in PyDSSP - updated docs - minimal regression tests - updated CHANGELOG

orbeckst

This is a quick draft. I'd be more than happy if someone continued and completed it.

orbeckst · 2025-06-12T00:26:07Z

package/MDAnalysis/analysis/dssp/dssp.py

+        self._donor_mask: Optional[np.ndarray] = (
+            ag.residues.resnames != "PRO" if ignore_proline_donor else None
+        )


This may not be correct. The code runs ... but I am not sure if I should be masking corresponding atoms.

orbeckst · 2025-06-12T00:27:40Z

package/MDAnalysis/analysis/dssp/pydssp_numpy.py

+         Mask out any hydrogens that should not be considered (in particular HN
+         in PRO). If ``None`` then all H will be used (behavior up to 2.9.0).


These docs should be more specific and state the shape. I just quickly guessed the shape from https://github.com/ShintaroMinami/PyDSSP/blob/e251a43ff8622fe0a555313b1567edce45e789e8/scripts/pydssp#L30

donor_mask = sequence != 'PRO' if args.ignore_proline_donor else None

orbeckst · 2025-06-12T00:29:28Z

package/MDAnalysis/analysis/dssp/pydssp_numpy.py

+        if donor_mask is not None
+        else np.ones(n_atoms, dtype=float)
+    )
+    donor_mask = np.tile(donor_mask[:, np.newaxis], (1, n_atoms))


Is the donor_mask (one element for each residue) really correct for this tiling????

orbeckst · 2025-06-12T00:31:32Z

package/MDAnalysis/analysis/dssp/pydssp_numpy.py

    # hydrogen bond map (continuous value extension of original definition)
    hbond_map = np.clip(cutoff - margin - e, a_min=-margin, a_max=margin)
    hbond_map = (np.sin(hbond_map / margin * np.pi / 2) + 1.0) / 2
-    hbond_map = hbond_map * local_mask
+    hbond_map *= local_mask
+    hbond_map *= donor_mask


Is this correct? The original code uses https://github.com/ShintaroMinami/PyDSSP/blob/e251a43ff8622fe0a555313b1567edce45e789e8/pydssp/pydssp_numpy.py#L72

hbond_map = hbond_map * repeat(donor_mask, 'l1 l2 -> b l1 l2', b=b)

(with einops.repeat()). Note that we create our donor_mask with tile so it may already be the right size and shape.

orbeckst · 2025-06-12T00:32:34Z

testsuite/MDAnalysisTests/analysis/test_dssp.py

@@ -13,15 +13,52 @@
    "pdb_filename", glob.glob(f"{DSSP_FOLDER}/?????.pdb.gz")
 )
 def test_file_guess_hydrogens(pdb_filename, client_DSSP):
+    # run 2.9.0 tests (which include PRO)
+    # ignore_proline_donor=False
+    # TODO: update reference data for ignore_proline_donor=True


We should really have correct reference data. About half of the files do not show a difference between ignore_proline_donor=True and ignore_proline_donor=False.

orbeckst · 2025-06-12T00:33:08Z

testsuite/MDAnalysisTests/analysis/test_dssp.py

-    protein = mda.Universe(TPR, XTC).select_atoms("protein")
-    run = DSSP(protein).run(**client_DSSP, stop=10)


These lines seemed superfluous as the atomgroup approach is tested separately.

codecov · 2025-06-12T00:37:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.63%. Comparing base (d412c9a) to head (87810eb).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #5065   +/-   ##
========================================
  Coverage    93.63%   93.63%           
========================================
  Files          177      177           
  Lines        22033    22037    +4     
  Branches      3115     3115           
========================================
+ Hits         20631    20635    +4     
  Misses         948      948           
  Partials       454      454

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

orbeckst force-pushed the update-dssp-proline-fix branch from b38e9db to 005100f Compare June 12, 2025 00:23

orbeckst commented Jun 12, 2025

View reviewed changes

Merge branch 'develop' into update-dssp-proline-fix

df31c86

orbeckst mentioned this pull request Jun 26, 2025

Update secondary structure assignment in DSSP after an upstream fix #4913

Open

Merge branch 'develop' into update-dssp-proline-fix

87810eb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix inclusion of PRO in secondary structure #5065

fix inclusion of PRO in secondary structure #5065

Uh oh!

orbeckst commented Jun 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

orbeckst left a comment

Uh oh!

orbeckst Jun 12, 2025

Uh oh!

orbeckst Jun 12, 2025

Uh oh!

orbeckst Jun 12, 2025

Uh oh!

orbeckst Jun 12, 2025

Uh oh!

orbeckst Jun 12, 2025

Uh oh!

orbeckst Jun 12, 2025

Uh oh!

codecov bot commented Jun 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

		Mask out any hydrogens that should not be considered (in particular HN
		in PRO). If ``None`` then all H will be used (behavior up to 2.9.0).

		protein = mda.Universe(TPR, XTC).select_atoms("protein")
		run = DSSP(protein).run(**client_DSSP, stop=10)

fix inclusion of PRO in secondary structure #5065

Are you sure you want to change the base?

fix inclusion of PRO in secondary structure #5065

Uh oh!

Conversation

orbeckst commented Jun 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Developers Certificate of Origin

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

orbeckst Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

orbeckst Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

orbeckst Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

orbeckst Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

orbeckst Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

orbeckst Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

orbeckst commented Jun 12, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jun 12, 2025 •

edited

Loading