You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, distance computation kernels are not perfectly load balanced. Most kernels follow this pattern: launch the number of threads on one operand and for each thread loop over points/segments of the correponding pair in the other. This can slow down the operation if the data are unbalanced/skewed.
Instead, the kernels should launch the number of thread that's makes one thread computes one pair of point-point/point-segment/segment-segment, then uses atomic operation to aggregate the result. This avoids slow down if the data is skewed.