-
Notifications
You must be signed in to change notification settings - Fork 129
[MPI] Distributed Discrete Morse Sandwich: Hybrid MPI+thread Persistence Diagram Computation #1112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MPI] Distributed Discrete Morse Sandwich: Hybrid MPI+thread Persistence Diagram Computation #1112
Conversation
Implements Distributed Discrete Morse Sandwich
|
|
||
| #ifdef TTK_ENABLE_MPI | ||
| template <typename triangulationType> | ||
| int DiscreteGradient::getSimplexRank(const triangulationType &triangulation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code of this function should be moved in AbstractTriangulation::getSimplexRank().
For that, dimensionality_ should be replaced with getDimensionality().
Then, that function can be removed. Thanks!
|
hi @eve-le-guillou |
|
Hi Julien, I just performed the modifications we discussed. Eve |
|
Let's gooooo!!!! |
Hi all,
This substantial PR implements the Distributed Discrete Morse Sandwich algorithm for the hybrid MPI+thread computation of persistence diagrams. For clarity, all preceding commits of this PR have been squashed into one (see the branch mpi_DMS of my personal repository for more detailed commits).
Changes:
A new class (DiscreteMorseSandwichMPI) was created using the Discrete Morse Sandwich (DMS) as basis. As the algorithm requires both OpenMP and MPI (communication thread), it is guarded with both
TTK_ENABLE_MPIandTTK_ENABLE_OPENMPcompilation variables.Some distributed triangulation features were lacking and have been added (e.g. retrieve the rank of an edge, a triangle, ...).
The distributed computation of the Discrete Gradient was modified: now the computation is performed for all simplices (including ghosts). The pairing can therefore be wrong for ghost simplices, but this simplifies the rest of the computation as no communication of gradient pairing is necessary.
A small bug in ArrayPreconditionning has been fixed (problem of
long intandintcasting isstd::accumulate).A small bug in ttkAlgorithm has been fixed (trigger of a global order when computing on only one process).
The distributed sort in psort.h has been modified to account for the case where at least one of the processes is empty.
Timing of the gradient computation has been moved from the VTK layer to the TTK layer to enable time measurement when called by another algorithm (in our case, DDMS).
The output diagram is distributed and constructed similarly to the sequential implementation.
Options and execution:
By default, the algorithm will compute D0 and D2 in separate tasks (the filter option
UseTaskscan change that).By default, the computation of the persistence diagram uses DMS. Users can select DDMS in the list of backends. If DDMS is chosen without MPI or OpenMP compiled, the algorithm DMS will be used instead.
A thread support level of 3 is required for execution. For OpenMPI, this means setting the environment variable
OMPI_MPI_THREAD_LEVELto 3.The option
Embed in domainhas not been implemented for this backend.Testing
Extensive testing has been conducted to ensure correctness of this algorithm, based on the benchmark by Pierre Guillou: https://github.com/pierre-guillou/pdiags_bench using a point by point comparison with results of the DMS algorithm.
Correctness was evaluated for datasets resampled to$256^3$ , for executions on 1, 2, 4, 8 and 16 processes, with 32 threads each.
Performance tests can be found in the associated paper listed at the beginning of this PR.
Thanks for any feedback,
Eve