Skip to content

Initial Attempt at PR for issue #3603 #5013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

EchoFia
Copy link

@EchoFia EchoFia commented Apr 5, 2025

Would greatly appreciate some feedback, as this is just meant to be a rough version (+ my first time doing a PR, so, hopefully it's in the right place). So that I know, what's the best way to discuss changes and get feedback?

It's very rough at the moment, cause I have a lot of questions that I think it would be good to clear up (I think I'll try and put another message in the chat of issue #3603 summarising). I'm not expecting this to be merged immediately, and so will properly fill this out later I imagine (i.e. the checkboxes and the certification), my eyes just need a break from reading through things.

Fixes #3603

Changes made in this Pull Request:

  • Changed test_lineardensity.py from using a system of 5 water molecules to 3 systems (neutral particles, charged particles and charged dimers) that better demonstrate each aspect of Linear Density analysis. The systems consist of 100 atoms/dimers of mass 1 in a cubic box of side length 1 Angstrom.
  • More detail will be in the comments of issue LinearDensity should have more reliable tests #3603 in an hour or 2, hopefully

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5013.org.readthedocs.build/en/5013/

…e feedback as this is just a rough attempt
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

Copy link

codecov bot commented Apr 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.58%. Comparing base (1cdb055) to head (b541d3f).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5013      +/-   ##
===========================================
- Coverage    93.61%   93.58%   -0.03%     
===========================================
  Files          177      189      +12     
  Lines        21894    22960    +1066     
  Branches      3095     3095              
===========================================
+ Hits         20495    21488     +993     
- Misses         946     1019      +73     
  Partials       453      453              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@EchoFia EchoFia marked this pull request as ready for review April 7, 2025 23:27
Copy link

@jeremyleung521 jeremyleung521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments on what to change, based on a quick read through. Haven't actually tried running anything yet.

@EchoFia
Copy link
Author

EchoFia commented Apr 9, 2025

I'm not exactly sure what is happening with errors, my code was committed fine, but then I tried to update the AUTHORS (which worked initially) but it failed like 4 times over. Any idea what is going wrong?

Also, thanks for all the feedback! I think I've fixed all of them, and have used black to format things nicer (there was a slight issue with the version but it seems it was fine).

@orbeckst orbeckst requested a review from Copilot April 11, 2025 06:22
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 3 changed files in this pull request and generated no comments.

Files not reviewed (2)
  • package/AUTHORS: Language not supported
  • package/CHANGELOG: Language not supported
Comments suppressed due to low confidence (2)

testsuite/MDAnalysisTests/analysis/test_lineardensity.py:51

  • [nitpick] Consider adopting a consistent snake_case naming convention for helper functions (e.g. use 'make_universe') to better align with typical Python style.
def make_Universe(

testsuite/MDAnalysisTests/analysis/test_lineardensity.py:185

  • [nitpick] Consider standardizing the function name to either 'find_centers' or 'find_centres' consistently across the codebase.
def find_Centres(

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your interest in contributing. At the moment I don't understand what your code is trying to accomplish.

Could you explain in your own words what you are trying to accomplish here? What is your strategy and key idea?

@@ -43,131 +45,298 @@ def test_invalid_grouping():
ld.run()


# test data for grouping='atoms'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't delete existing tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! So, my understanding of the problem was that the test for lineardensity.py was using itself to check itself (i.e. the saved 'answers' to the test were generated using the function) and that the test cases didn't cover the idea of charged residues (or larger) as water is neutral.

As such, I used the code given that creates a universe of 100 atoms, modified it such that the charges + positions matched the 3 different types of systems (neutral particles, charged particles & charged dimers), and then calculated the values of (total masses, total charges, mass densities and charge densities) for atoms, residues and segments. I then edited the pytest function to compare these to the values generated by LinearDensity.

Let me know if you want me to explain further, I feel that the code itself is simple enough to read through (though maybe that's because I wrote it, so I might be tricking myself), so I assume the uncertainty comes in what I'm trying to achieve, which, I've hopefully explained. I had a lot of questions at the start which I put in issue #3603, but those haven't been seen yet so I interpreted it as best I could.

Apologies for removing the tests with the water system, I can add it back, updating it for how I've restructured the test (essentially, just adding in the universe for the test, which won't change anything for the water system). I can also go through and change the spellings to American English.

In regards to the comments about naming things 'make_universe' as opposed to 'make_Universe', should I change that or leave it? From looking at other code, it appeared that the standard thing done in the module was capitalised after the first word with underscores.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, whilst I have your attention, can I clarify if my GSoC proposal is still valid or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean by "valid"?

(I also commented on your question #5015 )

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that for a valid GSoC application to MDAnalysis, you needed:

  • Pre-proposal accepted
  • Proposal on GSoC website
  • PR merged

and I had assumed that the PR had to be merged by the 8th as well, with the proposal, so I was worried that I'd missed that deadline. Thanks for answering question #5015, that has clarified things, and hopefully I'll be able to get this PR merged before you make assessments.

@orbeckst orbeckst added more information needed Please reply to requests for information or the issue will be closed. close? Evaluate if issue/PR is stale and can be closed. and removed more information needed Please reply to requests for information or the issue will be closed. close? Evaluate if issue/PR is stale and can be closed. labels Apr 11, 2025
@orbeckst
Copy link
Member

In principle there's no problem with tests checking that the code still produces the same answers. These are regression tests and they are valuable.

Having additional correctness tests, which check if a known correct value is obtained, is great, too. However, we don't want to write the same code for the calculation that we already have in MDAnalysis. Instead, you want to compare either against results from another trustworthy source or against analytical results. In the case here, I'd suggest to generate systems that should be homogeneous and then test that the computed densities are close to constant. You could generate a system on the fly (following #3603 (comment) ) at the density of water with

N = 1000        # number of particles
N_frames = 500  # number of trajectory frames

# 1.0 bulk density of water in Å**-3
water_density = mda.units.convert(1.0, "water", "A^{-3}")
V = N/water_density
L = V**(1/3)
print(f"cubic box with L={L} Å and N={N} particles")
box = np.array([L, L, L, 90, 90, 90])

# generate random uniformly distributed coordinates in the box
rng = np.random.default_rng()
positions = rng.uniform(low=[0,0,0], high=box[:3], size=(N_frames, N, 3))

# create a minimalist universe 
u = mda.Universe.empty(n_atoms=N, trajectory=True)
# update the in-memory representation with our generated data
u.trajectory.set_array(positions, order="fac")
u.trajectory.dimensions_array[:] = box

# set masses, charges, etc as needed
# ...

You now know the density of your system

density = u.atoms.n_atoms / u.trajectory.ts.volume
print(f"density {density} Å**-3")

and because you uniformly sampled space, on average the density should be constant everywhere. Now you know that your calculated particle density (from linearDensity) should be close to this number. Similarly, your mass density should be mass * density and your charge density q * density where mass and q is the mass and charge for a particle. To make a neutral system with charges, I'd just give a charge of +q to half the particles and -q to the other half and on average the charge should be 0.

You will not get the exact numbers because of randomness and incomplete sampling so you will have to play with the tolerances for your checks. Maybe they'll only be within a relative tolerance of 1e-3... but that's something to try.

@EchoFia
Copy link
Author

EchoFia commented Apr 12, 2025

Ah, okay, this makes sense. Thanks for clearing that up, I was questioning why I was rewriting the code, but I figured that it was just improving reproducibility.

I don't really understand why the system needs to be set at the density of water, as it seems like this just scales the system without changing any properties. Am I missing something here, or is it maybe just to standardise things?

I am also concerned with how the groupings work with this. I have been taking the groupings to be random choices of atoms, which means that their centre of mass is more likely to be in the middle than at the extremes, and as such, the distribution won't follow a uniform (nor are there enough atoms per residue/etc... for it to follow the central limit theorem, I fear). If it is a numbers game, I could simply increase the number of atoms, and then the groupings should more closely follow a normal distribution. I'll code it up and see what the required rtol would have to be, to see if it's reasonable.

@EchoFia
Copy link
Author

EchoFia commented Apr 12, 2025

Also, can I ask for a little further clarification on why my code doesn't match the requirements. My understanding is that my code is done almost independently of the module (admittedly, I should have used the masses and charges straight from the variable 'masses' and 'charges' rather than u.atoms.masses). As such, the code is quite simple and calculates the totals and the densities using relatively simple equations.

My perspective is that the values calculated will always be the correct values by the definition of density, and that if MDAnalysis is ever erroneous and giving wrong values, it will fail this test as the test is not dependent on the module. If there's a problem of lacking repeatability, I could document running millions of test cases against the current version of LinearDensity (which is assumed to give correct answers), or against a different software like VMD maybe to ensure it always gives the correct answers. I can also add the checks above, such as homogeny checks for 'groupings=atoms', as well as other potentially other checks, such as using different distributions for coords and seeing if the densities is as expected or cases where we expect zeros. Essentially, the aim would be to generate data that is reliably correct (as upon initial review, I don't immediately see any easy way to find lots of data that is reliable).

I'm coming from a science background, so this makes sense to me in terms of statistical repeatability, reproducibility and reliability, but I imagine the same concepts in computer science could be potentially slightly different in what they're looking for. Any ideas where the error in my thinking is, or am I just completely wrong here?

@orbeckst
Copy link
Member

Your code is too complicated for a simple test and thus will become a maintenance burden. Keep it as simple as possible.

It doesn’t have to be the density of water but you have to choose some value and the density of water is a typical value in soft matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LinearDensity should have more reliable tests
4 participants