Skip to content

Large potential speedup of pycbc_multi_inspiral #5148

@titodalcanton

Description

@titodalcanton

I am reporting some observations about the run times of pycbc_multi_inspiral jobs in recent PyGRB workflows submitted to the OSG.

First, the run time can vary by orders of magnitude depending on injections. For example, in one run, the run time can go from ~20 min to ~2 days. Jobs without injections have much more consistent run times, but they also tend to have a tail.

Second, when the number of templates and/or sky grid points becomes nontrivial, the run time becomes linear with them, e.g. with the 2.8.2 code version I get this for the user time:

Image

(the different colored curves are for 1, 3, 10, 30 and 100 sky grid points respectively). The run time is also somewhat linearly related to the number of triggers produced in the end:

Image

Consistently with this plot, the longest-running jobs in the workflows appear to be associated with the loudest injections. So I took one of the longest-running jobs (which has several hundred sky grid points) and did some profiling:

Image

3/4 of the time is being spent in computing the power chi^2, and 14% is spent appending triggers to a Numpy array. The cost of the chi^2 can be understood because the chi^2 is recalculated at every slide/sky combination. If I instead precalculate the chi^2 once before the loop over slides/sky, the same job completes in about 1/4 of the time, as expected. The profiling now reveals a more complex picture:

Image

The next thing to attack is the accumulation of the triggers which is now taking half of the time (!). I would try preallocating a large array of triggers instead of growing one continuously.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PyGRBPyGRB development

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions