Initial Attempt at PR for issue #3603 #5013

EchoFia · 2025-04-05T21:02:53Z

Would greatly appreciate some feedback, as this is just meant to be a rough version (+ my first time doing a PR, so, hopefully it's in the right place). So that I know, what's the best way to discuss changes and get feedback?

It's very rough at the moment, cause I have a lot of questions that I think it would be good to clear up (I think I'll try and put another message in the chat of issue #3603 summarising). I'm not expecting this to be merged immediately, and so will properly fill this out later I imagine (i.e. the checkboxes and the certification), my eyes just need a break from reading through things.

Fixes #3603

Changes made in this Pull Request:

Changed test_lineardensity.py from using a system of 5 water molecules to 3 systems (neutral particles, charged particles and charged dimers) that better demonstrate each aspect of Linear Density analysis. The systems consist of 100 atoms/dimers of mass 1 in a cubic box of side length 1 Angstrom.
More detail will be in the comments of issue LinearDensity should have more reliable tests #3603 in an hour or 2, hopefully

PR Checklist

Issue raised/referenced?
Tests updated/added?
Documentation updated/added?
package/CHANGELOG file updated?
Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.

📚 Documentation preview 📚: https://mdanalysis--5013.org.readthedocs.build/en/5013/

…e feedback as this is just a rough attempt

github-actions

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

codecov · 2025-04-07T16:35:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.58%. Comparing base (1cdb055) to head (b541d3f).

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #5013      +/-   ##
===========================================
- Coverage    93.61%   93.58%   -0.03%     
===========================================
  Files          177      189      +12     
  Lines        21894    22960    +1066     
  Branches      3095     3095              
===========================================
+ Hits         20495    21488     +993     
- Misses         946     1019      +73     
  Partials       453      453

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jeremyleung521

Just some comments on what to change, based on a quick read through. Haven't actually tried running anything yet.

testsuite/MDAnalysisTests/analysis/test_lineardensity.py

EchoFia · 2025-04-09T19:57:05Z

I'm not exactly sure what is happening with errors, my code was committed fine, but then I tried to update the AUTHORS (which worked initially) but it failed like 4 times over. Any idea what is going wrong?

Also, thanks for all the feedback! I think I've fixed all of them, and have used black to format things nicer (there was a slight issue with the version but it seems it was fine).

Copilot

Copilot reviewed 1 out of 3 changed files in this pull request and generated no comments.

Files not reviewed (2)

package/AUTHORS: Language not supported
package/CHANGELOG: Language not supported

Comments suppressed due to low confidence (2)

testsuite/MDAnalysisTests/analysis/test_lineardensity.py:51

[nitpick] Consider adopting a consistent snake_case naming convention for helper functions (e.g. use 'make_universe') to better align with typical Python style.

def make_Universe(

testsuite/MDAnalysisTests/analysis/test_lineardensity.py:185

[nitpick] Consider standardizing the function name to either 'find_centers' or 'find_centres' consistently across the codebase.

def find_Centres(

orbeckst

Thank you for your interest in contributing. At the moment I don't understand what your code is trying to accomplish.

Could you explain in your own words what you are trying to accomplish here? What is your strategy and key idea?

orbeckst · 2025-04-11T06:24:59Z

testsuite/MDAnalysisTests/analysis/test_lineardensity.py

@@ -43,131 +45,298 @@ def test_invalid_grouping():
        ld.run()


-# test data for grouping='atoms'


Don't delete existing tests.

Hi! So, my understanding of the problem was that the test for lineardensity.py was using itself to check itself (i.e. the saved 'answers' to the test were generated using the function) and that the test cases didn't cover the idea of charged residues (or larger) as water is neutral.

As such, I used the code given that creates a universe of 100 atoms, modified it such that the charges + positions matched the 3 different types of systems (neutral particles, charged particles & charged dimers), and then calculated the values of (total masses, total charges, mass densities and charge densities) for atoms, residues and segments. I then edited the pytest function to compare these to the values generated by LinearDensity.

Let me know if you want me to explain further, I feel that the code itself is simple enough to read through (though maybe that's because I wrote it, so I might be tricking myself), so I assume the uncertainty comes in what I'm trying to achieve, which, I've hopefully explained. I had a lot of questions at the start which I put in issue #3603, but those haven't been seen yet so I interpreted it as best I could.

Apologies for removing the tests with the water system, I can add it back, updating it for how I've restructured the test (essentially, just adding in the universe for the test, which won't change anything for the water system). I can also go through and change the spellings to American English.

In regards to the comments about naming things 'make_universe' as opposed to 'make_Universe', should I change that or leave it? From looking at other code, it appeared that the standard thing done in the module was capitalised after the first word with underscores.

Also, whilst I have your attention, can I clarify if my GSoC proposal is still valid or not?

Sorry, what do you mean by "valid"?

(I also commented on your question #5015 )

I was under the impression that for a valid GSoC application to MDAnalysis, you needed:

Pre-proposal accepted

Proposal on GSoC website

PR merged

and I had assumed that the PR had to be merged by the 8th as well, with the proposal, so I was worried that I'd missed that deadline. Thanks for answering question #5015, that has clarified things, and hopefully I'll be able to get this PR merged before you make assessments.

orbeckst · 2025-04-12T00:24:52Z

In principle there's no problem with tests checking that the code still produces the same answers. These are regression tests and they are valuable.

Having additional correctness tests, which check if a known correct value is obtained, is great, too. However, we don't want to write the same code for the calculation that we already have in MDAnalysis. Instead, you want to compare either against results from another trustworthy source or against analytical results. In the case here, I'd suggest to generate systems that should be homogeneous and then test that the computed densities are close to constant. You could generate a system on the fly (following #3603 (comment) ) at the density of water with

N = 1000        # number of particles
N_frames = 500  # number of trajectory frames

# 1.0 bulk density of water in Å**-3
water_density = mda.units.convert(1.0, "water", "A^{-3}")
V = N/water_density
L = V**(1/3)
print(f"cubic box with L={L} Å and N={N} particles")
box = np.array([L, L, L, 90, 90, 90])

# generate random uniformly distributed coordinates in the box
rng = np.random.default_rng()
positions = rng.uniform(low=[0,0,0], high=box[:3], size=(N_frames, N, 3))

# create a minimalist universe 
u = mda.Universe.empty(n_atoms=N, trajectory=True)
# update the in-memory representation with our generated data
u.trajectory.set_array(positions, order="fac")
u.trajectory.dimensions_array[:] = box

# set masses, charges, etc as needed
# ...

You now know the density of your system

density = u.atoms.n_atoms / u.trajectory.ts.volume
print(f"density {density} Å**-3")

and because you uniformly sampled space, on average the density should be constant everywhere. Now you know that your calculated particle density (from linearDensity) should be close to this number. Similarly, your mass density should be mass * density and your charge density q * density where mass and q is the mass and charge for a particle. To make a neutral system with charges, I'd just give a charge of +q to half the particles and -q to the other half and on average the charge should be 0.

You will not get the exact numbers because of randomness and incomplete sampling so you will have to play with the tolerances for your checks. Maybe they'll only be within a relative tolerance of 1e-3... but that's something to try.

EchoFia · 2025-04-12T07:33:24Z

Ah, okay, this makes sense. Thanks for clearing that up, I was questioning why I was rewriting the code, but I figured that it was just improving reproducibility.

I don't really understand why the system needs to be set at the density of water, as it seems like this just scales the system without changing any properties. Am I missing something here, or is it maybe just to standardise things?

I am also concerned with how the groupings work with this. I have been taking the groupings to be random choices of atoms, which means that their centre of mass is more likely to be in the middle than at the extremes, and as such, the distribution won't follow a uniform (nor are there enough atoms per residue/etc... for it to follow the central limit theorem, I fear). If it is a numbers game, I could simply increase the number of atoms, and then the groupings should more closely follow a normal distribution. I'll code it up and see what the required rtol would have to be, to see if it's reasonable.

EchoFia · 2025-04-12T07:47:31Z

Also, can I ask for a little further clarification on why my code doesn't match the requirements. My understanding is that my code is done almost independently of the module (admittedly, I should have used the masses and charges straight from the variable 'masses' and 'charges' rather than u.atoms.masses). As such, the code is quite simple and calculates the totals and the densities using relatively simple equations.

My perspective is that the values calculated will always be the correct values by the definition of density, and that if MDAnalysis is ever erroneous and giving wrong values, it will fail this test as the test is not dependent on the module. If there's a problem of lacking repeatability, I could document running millions of test cases against the current version of LinearDensity (which is assumed to give correct answers), or against a different software like VMD maybe to ensure it always gives the correct answers. I can also add the checks above, such as homogeny checks for 'groupings=atoms', as well as other potentially other checks, such as using different distributions for coords and seeing if the densities is as expected or cases where we expect zeros. Essentially, the aim would be to generate data that is reliably correct (as upon initial review, I don't immediately see any easy way to find lots of data that is reliable).

I'm coming from a science background, so this makes sense to me in terms of statistical repeatability, reproducibility and reliability, but I imagine the same concepts in computer science could be potentially slightly different in what they're looking for. Any ideas where the error in my thinking is, or am I just completely wrong here?

orbeckst · 2025-04-12T15:40:08Z

Your code is too complicated for a simple test and thus will become a maintenance burden. Keep it as simple as possible.

It doesn’t have to be the density of water but you have to choose some value and the density of water is a typical value in soft matter.

Initial Attempt at PR for issue MDAnalysis#3603, would appreciate som…

86462f8

…e feedback as this is just a rough attempt

github-actions bot reviewed Apr 5, 2025

View reviewed changes

tylerjereddy added testing Component-Analysis labels Apr 5, 2025

EchoFia marked this pull request as draft April 7, 2025 16:21

Fixed pytest and Centre of Mass for issue MDAnalysis#3603

09a9f60

EchoFia marked this pull request as ready for review April 7, 2025 23:27

jeremyleung521 reviewed Apr 8, 2025

View reviewed changes

Echo Fia and others added 7 commits April 9, 2025 15:29

Fixed pytest and Centre of Mass for issue MDAnalysis#3603

920c4df

Updated AUTHORS & CHANGELOG, Improved Formatting

2de699e

Updated AUTHORS & CHANGELOG, Improved Formatting

f906031

Updated AUTHORS & CHANGELOG, Improved Formatting

661638a

Updated AUTHORS & CHANGELOG, Improved Formatting

73e586d

Updated package/AUTHORS

9b04325

Merge branch 'develop' into new-testLinearDensity

b541d3f

orbeckst requested a review from Copilot April 11, 2025 06:22

Copilot AI reviewed Apr 11, 2025

View reviewed changes

orbeckst requested changes Apr 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Attempt at PR for issue #3603 #5013

Initial Attempt at PR for issue #3603 #5013

EchoFia commented Apr 5, 2025 •

edited

Loading

github-actions bot left a comment

codecov bot commented Apr 7, 2025 •

edited

Loading

jeremyleung521 left a comment •

edited

Loading

EchoFia commented Apr 9, 2025

Copilot AI left a comment

orbeckst left a comment

orbeckst Apr 11, 2025

EchoFia Apr 11, 2025

EchoFia Apr 11, 2025

orbeckst Apr 11, 2025

EchoFia Apr 11, 2025

orbeckst commented Apr 12, 2025

EchoFia commented Apr 12, 2025

EchoFia commented Apr 12, 2025

orbeckst commented Apr 12, 2025

		@@ -43,131 +45,298 @@ def test_invalid_grouping():
		ld.run()


		# test data for grouping='atoms'

Initial Attempt at PR for issue #3603 #5013

Are you sure you want to change the base?

Initial Attempt at PR for issue #3603 #5013

Conversation

EchoFia commented Apr 5, 2025 • edited Loading

PR Checklist

Developers Certificate of Origin

github-actions bot left a comment

Choose a reason for hiding this comment

codecov bot commented Apr 7, 2025 • edited Loading

Codecov Report

jeremyleung521 left a comment • edited Loading

Choose a reason for hiding this comment

EchoFia commented Apr 9, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst Apr 11, 2025

Choose a reason for hiding this comment

EchoFia Apr 11, 2025

Choose a reason for hiding this comment

EchoFia Apr 11, 2025

Choose a reason for hiding this comment

orbeckst Apr 11, 2025

Choose a reason for hiding this comment

EchoFia Apr 11, 2025

Choose a reason for hiding this comment

orbeckst commented Apr 12, 2025

EchoFia commented Apr 12, 2025

EchoFia commented Apr 12, 2025

orbeckst commented Apr 12, 2025

EchoFia commented Apr 5, 2025 •

edited

Loading

codecov bot commented Apr 7, 2025 •

edited

Loading

jeremyleung521 left a comment •

edited

Loading