-
-
Notifications
You must be signed in to change notification settings - Fork 211
Description
For spatial searches, the function compute_distance_gjk
is templated over scalar types and defaults to using boost::multiprecision::cpp_bin_float_double_extended>
:
dolfinx/cpp/dolfinx/geometry/gjk.h
Lines 306 to 309 in 3817e51
template <std::floating_point T, | |
typename U = boost::multiprecision::cpp_bin_float_double_extended> | |
std::array<T, 3> compute_distance_gjk(std::span<const T> p0, | |
std::span<const T> q0) |
This function is heavily relied upon by determine_point_ownership
via compute_first_colliding_cell
. The call inside of compute_first_colliding_cell
looks like this:
dolfinx/cpp/dolfinx/geometry/utils.h
Lines 543 to 544 in 3817e51
std::array<T, 3> shortest_vector | |
= compute_distance_gjk<T>(point, coordinate_dofs); |
This call always uses boost::multiprecision::cpp_bin_float_double_extended>
and in my experience is the time bottleneck inside of determine_point_ownership
. Using double
makes things much faster.
How would you go about exposing this template argument to users i.e. letting users decide what scalar type to use in compute_distance_gjk
? Maybe an environment variable?
Here is a profiling example. This small script calls determine_point_ownership a few times:
from dolfinx import mesh, geometry
import numpy as np
from mpi4py import MPI
from line_profiler import LineProfiler
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
def main():
tdim = 3
num_elems_side = 20
domain = mesh.create_unit_cube(MPI.COMM_WORLD,
num_elems_side, num_elems_side, num_elems_side,
mesh.CellType.tetrahedron, dtype=np.float64)
call_determine_point_ownership(domain)
def call_determine_point_ownership(domain):
num_points = 100000
for it in range(10):
rng = np.random.default_rng(it)
points = rng.random((num_points, 3))
po = geometry._cpp.geometry.determine_point_ownership(
domain._cpp_object,
points,
1e-6)
if __name__=="__main__":
lp = LineProfiler()
lp.add_function(call_determine_point_ownership)
lp_wrapper = lp(main)
lp_wrapper()
with open(f"profiling_rank{rank}.txt", 'w') as pf:
lp.print_stats(stream=pf, sort=True, stripzeros=True)
With boost::multiprecision
it takes 32.9 seconds, and with double
instead, it takes about 3.1 seconds on my container. Results might be more accurate with boost::multiprecision
, but doubles are more than enough for my particular application.