-
Notifications
You must be signed in to change notification settings - Fork 364
Optimize volume_montecarlo performance #5124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Improved array operations by pre-allocating arrays and reducing concatenations
This does not affect PyGRB, as far as I know. |
if distribution_param == 'distance': | ||
found_weights = found_d ** d_power | ||
missed_weights = missed_d ** d_power | ||
found_weights = numpy.power(found_d, d_power) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this in the documentation for numpy power:
The ** operator can be used as a shorthand for np.power on ndarrays
Hence, I do not believe this change to use numpy.power can have any effect on performance, and it reduces readability IMO. Please revert these specific changes unless they are shown to significantly improve performance.
montecarlo_vtot = (4. / 3.) * numpy.pi * max_distance**3. | ||
|
||
# arrays of weights for the MC integral | ||
# Calculate weights based on distribution parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Calculate weights based on distribution parameters | |
# Calculate weights for the MC integral |
'MC integral' is useful scientific/statistical context to understand what is happening
# over injections covering the sphere | ||
mc_weight_samples = numpy.concatenate((found_weights, 0 * missed_weights)) | ||
mc_sum = sum(mc_weight_samples) | ||
found_weights = (numpy.power(found_d, d_power) * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment has been removed here, please reinstate it.
numpy.power(found_mchirp, mchirp_power)) | ||
missed_weights = (numpy.power(missed_d, d_power) * | ||
numpy.power(missed_mchirp, mchirp_power)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final else
statement with NotImplementedError has been removed. I don't see why. We want the code to raise prompt and informative errors ..
mc_weight_samples = numpy.zeros(total_size) | ||
mc_weight_samples[:len(found_weights)] = found_weights | ||
|
||
# Calculate Monte Carlo statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation of the calculation in the previous calculation has been removed and replaced by a generic non-informative comment. Please reinstate the previous comment.
|
||
if limits_param == 'distance': | ||
mc_norm = sum(all_weights) | ||
mc_norm = sum(numpy.concatenate((found_weights, missed_weights))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If numpy.concatenate is inefficient, why is it being used here?
# Calculate Monte Carlo statistics | ||
mc_sum = numpy.sum(mc_weight_samples) | ||
mc_sum_squares = numpy.sum(mc_weight_samples ** 2) | ||
mc_sample_variance = (mc_sum_squares / len(mc_weight_samples) - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calculation does not seem to reproduce the previous calculation using 'Ninj' in all cases. In addition, len(mc_weight_samples)
is just the same as total_size
so I don't see why the existing variable is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see what is the overall strategy here. In one place a concatenate operation has been removed but the operation is added in more than one other place. Arbitrary changes from **
to numpy.power
which should have no effect on performance are made. In addition several nontrivial comments have been removed.
What specific example function call was made to obtain the advertised speedups? |
Improved array operations by pre-allocating arrays and reducing concatenations
Standard information about the request
A standard efficiency upgrade for the volume_montecarlo function.
It doesn't change the output of the function at all.
This change affects: the offline search, PyGRB
This change: follows style guidelines (See e.g. PEP8), has been proposed using the contribution guidelines
Motivation
I saw the function was using a lot of concat calls which could be reduced, as it can be a substantial performance bottleneck.
Contents
All-round performance optimizations to the function were made, primarily improving the efficiency of the array operations. The performance improvements made were quite substantial, as can be seen from the testing below.
Testing performed