-
Notifications
You must be signed in to change notification settings - Fork 371
Description
I am reporting some observations about the run times of pycbc_multi_inspiral jobs in recent PyGRB workflows submitted to the OSG.
First, the run time can vary by orders of magnitude depending on injections. For example, in one run, the run time can go from ~20 min to ~2 days. Jobs without injections have much more consistent run times, but they also tend to have a tail.
Second, when the number of templates and/or sky grid points becomes nontrivial, the run time becomes linear with them, e.g. with the 2.8.2 code version I get this for the user time:
(the different colored curves are for 1, 3, 10, 30 and 100 sky grid points respectively). The run time is also somewhat linearly related to the number of triggers produced in the end:
Consistently with this plot, the longest-running jobs in the workflows appear to be associated with the loudest injections. So I took one of the longest-running jobs (which has several hundred sky grid points) and did some profiling:
3/4 of the time is being spent in computing the power chi^2, and 14% is spent appending triggers to a Numpy array. The cost of the chi^2 can be understood because the chi^2 is recalculated at every slide/sky combination. If I instead precalculate the chi^2 once before the loop over slides/sky, the same job completes in about 1/4 of the time, as expected. The profiling now reveals a more complex picture:
The next thing to attack is the accumulation of the triggers which is now taking half of the time (!). I would try preallocating a large array of triggers instead of growing one continuously.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status



