Skip to content

MDAnalysis.lib.distances needs rework #2046

@zemanj

Description

@zemanj

The documentation of the module MDAnalysis.lib.distances has several issues, whereof one has already been addressed by @xiki-tempula in issue #2004. I also see some points where code duplication could be reduced to make the code more DRY.

Documentation issues:

Code issues:

  • Many functions use the _box_check() helper function to determine the type of simulation box supplied. Thereafter, the box coordinates are transformed to the memory layout expected by the subsequently called low-level C functions. This transformation should be incorporated into _check_box() to avoid code duplication. (Fixed via PR Simplified code in lib.distances a bit #2048)
  • The docstrings of most functions state that the box must be supplied in the format [lx, ly, lz, alpha, beta, gamma] as returned by `Timestep.dimensions```. The _box_check function doesn't reflect that. So either the requirements for the box should be less strict or ``_box_check`` should be stricter in that respect. (Fixed via PR Simplified code in lib.distances a bit #2048)
  • Not only checking but also creating the results array (if required) should take place in the _check_result_array() helper function. (Fixed via PR Simplified code in lib.distances a bit #2048)
  • Many functions now incorporate automatic dtype conversion, so the dtype check in the _check_array() helper function is now redundant and can be removed in these cases. (Fixed via PR Simplified code in lib.distances a bit #2048 by means of a new @check_coords() decorator)
  • PR Simplified code in lib.distances a bit #2048 introduced a subtle bug so that in certain situations, some functions change their input coordinate arrays in-place. (Fixed via PR bug fixes in lib.distances #2083)
  • Many functions choke on empty input coordinates arrays (i.e., with shape=(0, 3)). (Fixed via PR bug fixes in lib.distances #2083)
  • Depending on the employed search method, the results of capped_distance() and self_capped_distance() are not always numpy arrays. (Fixed via PR bug fixes in lib.distances #2083)
  • If no pairs are found, _bruteforce_capped_self() correctly returns empty pairs but unfortunately also non-empty mumbo-jumbo distances. (Fixed via PR bug fixes in lib.distances #2083)
  • _bruteforce_capped() crashes if all input coordinates are the same and box is None. (Fixed via PR bug fixes in lib.distances #2083)
  • lib.distances._check_box() should be moved to lib.util (with underscore removed). (Fixed via PR Issue 2046 lib distances rework continued #2114)
  • Not all methods for *capped_distance() have the same cut-off criteria (sometimes distances < max_cutoff and sometimes distances <= max_cutoff). (Fixed via PR Issue 2046 lib distances rework continued #2114)
  • The different methods for *capped_distance() do not always return the same number of pairs. (EDIT: only pathological cases, won't fix)
  • lib.nsgrid calculates distances in single precision, whereas all functions in lib.distances use double precision. (Fixed via PR Issue 2046 lib distances rework continued #2114)
  • lib.nsgrid.PBCBox uses an arbitrary constant EPSILON=1e-5 to determine whether a box is triclinic. This a) does not correspond to the behavior of other functions, and b) fails if a box angle is, e.g., 90.00001 degrees (or higher or negative). (Fixed via PR Issue 2046 lib distances rework continued #2114)
  • lib/include/calc_distances.h contains a lot of duplicated code (functions differing in PBC type only). (EDIT: unifying functions for different PBC types impacts performance, won't fix)
  • OpenMP parallelization in lib/include/calc_distances.h often suffers from false sharing.
  • In lib.distances.distance_array(), the inner loop (the one over the configuration coordinate array) should be parallelized instead of the outer one (going over the reference coordinate array), since often, one seeks to know the number of "neighbors" with respect to some reference atom.
  • Tidy the namespace, make C functions invisible, currently we have everything visible

TODO suggestions:

That's quite a lot of things to do, but I've already started working on most of the points.
There are still issues to be discussed, especially the module's title and description and how to proceed with the requirements for boxes.

Current version of MDAnalysis:

0.18.1-dev 0.19.1-dev

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions