-
Notifications
You must be signed in to change notification settings - Fork 767
Open
Description
The documentation of the module MDAnalysis.lib.distances has several issues, whereof one has already been addressed by @xiki-tempula in issue #2004. I also see some points where code duplication could be reduced to make the code more DRY.
Documentation issues:
- Probably deprecated title: Fast distance array computation
IMHO that seems a bit misleading. Even though the module contains methods to compute distance arrays, it also includes methods for coordinate transformation, PBC handling, and for the calculation of angles and dihedrals. - The same issue applies to the doc header, which doesn't describe the module's contents sufficiently since it only refers to distance arrays. (Fixed via PR improved docstrings in lib/distances.py #2070)
- The methods listed as
.. autofunction: ...in the header appear as duplicates in the docs. Furthermore, the way theautofunctionstrings are formatted may not be very useful (nested square brackets without default values for kwargs). (Fixed via PR improved docstrings in lib/distances.py #2070) - Many docstrings violate the 80 character line length limit. (Fixed via PR improved docstrings in lib/distances.py #2070)
- The methods
augment_coordinatesandundo_augment, which are imported fromMDAnalysis.lib._augment, are missing from the docs. (Fixed via PR Improved docstrings in lib/_augment.pyx #2062) - Many docstrings have issues with spelling mistakes, missing or improperly referenced parameters and methods, wrong or missing data types, and potentially misleading explanations. That includes but is not limited to issue Documentation issue regarding MDAnalysis.lib.distances.calc_bonds #2004. (Fixed via PR improved docstrings in lib/distances.py #2070)
- In general, the docstrings could be much more standardized, there's no need to have different descriptions for parameters such as box which always have the same purpose or meaning. (Fixed via PR improved docstrings in lib/distances.py #2070)
Code issues:
- Many functions use the
_box_check()helper function to determine the type of simulation box supplied. Thereafter, the box coordinates are transformed to the memory layout expected by the subsequently called low-level C functions. This transformation should be incorporated into_check_box()to avoid code duplication. (Fixed via PR Simplified code inlib.distancesa bit #2048) - The docstrings of most functions state that the box must be supplied in the format
[lx, ly, lz, alpha, beta, gamma]as returned by `Timestep.dimensions```. The_box_checkfunction doesn't reflect that. Soeither the requirements for the box should be less strict or``_box_check`` should be stricter in that respect. (Fixed via PR Simplified code inlib.distancesa bit #2048) - Not only checking but also creating the results array (if required) should take place in the
_check_result_array()helper function. (Fixed via PR Simplified code inlib.distancesa bit #2048) - Many functions now incorporate automatic dtype conversion, so the dtype check in the
_check_array()helper function is now redundant and can be removed in these cases. (Fixed via PR Simplified code inlib.distancesa bit #2048 by means of a new@check_coords()decorator) - PR Simplified code in
lib.distancesa bit #2048 introduced a subtle bug so that in certain situations, some functions change their input coordinate arrays in-place. (Fixed via PR bug fixes in lib.distances #2083) - Many functions choke on empty input coordinates arrays (i.e., with
shape=(0, 3)). (Fixed via PR bug fixes in lib.distances #2083) - Depending on the employed search method, the results of
capped_distance()andself_capped_distance()are not always numpy arrays. (Fixed via PR bug fixes in lib.distances #2083) - If no pairs are found,
_bruteforce_capped_self()correctly returns empty pairs but unfortunately also non-empty mumbo-jumbo distances. (Fixed via PR bug fixes in lib.distances #2083) -
_bruteforce_capped()crashes if all input coordinates are the same andboxisNone. (Fixed via PR bug fixes in lib.distances #2083) -
lib.distances._check_box()should be moved tolib.util(with underscore removed). (Fixed via PR Issue 2046 lib distances rework continued #2114) - Not all methods for
*capped_distance()have the same cut-off criteria (sometimesdistances < max_cutoffand sometimesdistances <= max_cutoff). (Fixed via PR Issue 2046 lib distances rework continued #2114) - The different methods for
*capped_distance()do not always return the same number of pairs. (EDIT: only pathological cases, won't fix) -
lib.nsgridcalculates distances in single precision, whereas all functions in lib.distances use double precision. (Fixed via PR Issue 2046 lib distances rework continued #2114) -
lib.nsgrid.PBCBoxuses an arbitrary constantEPSILON=1e-5to determine whether a box is triclinic. This a) does not correspond to the behavior of other functions, and b) fails if a box angle is, e.g., 90.00001 degrees (or higher or negative). (Fixed via PR Issue 2046 lib distances rework continued #2114) -
lib/include/calc_distances.hcontains a lot of duplicated code (functions differing in PBC type only). (EDIT: unifying functions for different PBC types impacts performance, won't fix) - OpenMP parallelization in
lib/include/calc_distances.hoften suffers from false sharing. - In
lib.distances.distance_array(), the inner loop (the one over theconfigurationcoordinate array) should be parallelized instead of the outer one (going over thereferencecoordinate array), since often, one seeks to know the number of "neighbors" with respect to some reference atom. - Tidy the namespace, make C functions invisible, currently we have everything visible
TODO suggestions:
- Documentation:
- Ensure that parameters referenced in docstrings correspond to function signatures. (Fixed via PRs Simplified code in
lib.distancesa bit #2048 and improved docstrings in lib/distances.py #2070) - Properly format references to other methods so that links are generated. Add
.. seealso :See Alsosections where applicable. (Fixed via PR improved docstrings in lib/distances.py #2070) - Use italics only to reference function parameters, use teletype font for variables and values in docstrings. (Fixed via PR improved docstrings in lib/distances.py #2070)
- Use logical instead of visual markup (e.g., backticks instead of asterisks for function parameters). (Fixed via PR improved docstrings in lib/distances.py #2070)
- Enforce 80 character lines (Fixed via PR improved docstrings in lib/distances.py #2070)
- Standardize phrasing of repeatedly occurring parameter descriptions (Fixed via PR improved docstrings in lib/distances.py #2070)
- Always state correct type of parameters (e.g.
numpy.ndarrayif a numpy array is expected) (Fixed via PRs Simplified code inlib.distancesa bit #2048, Improved docstrings in lib/_augment.pyx #2062, and improved docstrings in lib/distances.py #2070) - Mark optional parameters by adding , optional after the type (e.g.
box : array_like, optional) (Fixed via PR improved docstrings in lib/distances.py #2070) - Do not give default values for kwargs in brackets since these appear in the auto-generated function signature. (Fixed via PR improved docstrings in lib/distances.py #2070)
- Module Header:
- Change title to something more descriptive (e.g., "fast geometry computations" or similar)
- Improve module description to better correspond to its contents (Fixed via PR improved docstrings in lib/distances.py #2070)
- Remove
.. autofunctionlines:members:fromdistances.rstcausing duplicates
- Ensure that parameters referenced in docstrings correspond to function signatures. (Fixed via PRs Simplified code in
- Code:
- Move box memory layout transformations to
_check_boxandeither loosen requirements for its expected format ormake checks stricter. (Fixed via PR Simplified code inlib.distancesa bit #2048) - Fix bugs introduced in PR Simplified code in
lib.distancesa bit #2048 and add tests assuring input arrays always remain unmodified. (Fixed via PR bug fixes in lib.distances #2083) - For completeness, make sure that all functions also work with empty coordinate arrays (
shape=(0, 3)) and add corresponding test cases. (Fixed via PR bug fixes in lib.distances #2083) - Ensure
capped_distance()andself_capped_distance()always return numpy arrays and add corresponding tests. (Fixed via PR bug fixes in lib.distances #2083) - Ensure
_bruteforce_capped_self()returns empty distances if no pairs are found. Add corresponding tests for all*capped_distance()functions. (Fixed via PR bug fixes in lib.distances #2083) - Fix
_bruteforce_capped()for the case when all input coordinates are the same andboxisNone. (Fixed via PR bug fixes in lib.distances #2083) - Move
lib.distances._check_box()tolib.util(with underscore removed). (Fixed via PR Issue 2046 lib distances rework continued #2114) - Ensure that all methods used in
capped_distance()have the same cut-off criterion (distances <= max_cutoff). (Fixed via PR Issue 2046 lib distances rework continued #2114) - Use double precision for distance computation in
lib.nsgrid. (Fixed via PR Issue 2046 lib distances rework continued #2114) - Fix box type detection in
lib.nsgridand some functions inlib.distances. (Fixed via PR Issue 2046 lib distances rework continued #2114) - Optimize parallelism in
lib/include/calc_distances.hfor 64 byte cachelines (avoid false sharing). - Parallelize inner instead of outer loop in
lib.distances.distance_array().
- Move box memory layout transformations to
That's quite a lot of things to do, but I've already started working on most of the points.
There are still issues to be discussed, especially the module's title and description and how to proceed with the requirements for boxes.
Current version of MDAnalysis:
0.18.1-dev 0.19.1-dev
jbarnoud and orbeckst