Skip to content

Optimize volume_montecarlo performance #5124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

fvngs
Copy link

@fvngs fvngs commented May 10, 2025

Improved array operations by pre-allocating arrays and reducing concatenations

Standard information about the request

A standard efficiency upgrade for the volume_montecarlo function.
It doesn't change the output of the function at all.

This change affects: the offline search, PyGRB
This change: follows style guidelines (See e.g. PEP8), has been proposed using the contribution guidelines

Motivation

I saw the function was using a lot of concat calls which could be reduced, as it can be a substantial performance bottleneck.

Contents

All-round performance optimizations to the function were made, primarily improving the efficiency of the array operations. The performance improvements made were quite substantial, as can be seen from the testing below.

Testing performed

Size Original (s) Optimized (s) Speedup
100 0.000651 0.000357 1.82x
1000 0.001938 0.000629 3.08x
10000 0.023510 0.007141 3.29x
100000 0.323696 0.057062 5.67x
  • The author of this pull request confirms they will adhere to the code of conduct

Improved array operations by pre-allocating arrays and reducing concatenations
@ahnitz ahnitz requested review from tdent and pannarale May 10, 2025 19:32
@pannarale
Copy link
Contributor

This does not affect PyGRB, as far as I know.

if distribution_param == 'distance':
found_weights = found_d ** d_power
missed_weights = missed_d ** d_power
found_weights = numpy.power(found_d, d_power)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this in the documentation for numpy power:
The ** operator can be used as a shorthand for np.power on ndarrays

Hence, I do not believe this change to use numpy.power can have any effect on performance, and it reduces readability IMO. Please revert these specific changes unless they are shown to significantly improve performance.

montecarlo_vtot = (4. / 3.) * numpy.pi * max_distance**3.

# arrays of weights for the MC integral
# Calculate weights based on distribution parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Calculate weights based on distribution parameters
# Calculate weights for the MC integral

'MC integral' is useful scientific/statistical context to understand what is happening

# over injections covering the sphere
mc_weight_samples = numpy.concatenate((found_weights, 0 * missed_weights))
mc_sum = sum(mc_weight_samples)
found_weights = (numpy.power(found_d, d_power) *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has been removed here, please reinstate it.

numpy.power(found_mchirp, mchirp_power))
missed_weights = (numpy.power(missed_d, d_power) *
numpy.power(missed_mchirp, mchirp_power))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final else statement with NotImplementedError has been removed. I don't see why. We want the code to raise prompt and informative errors ..

mc_weight_samples = numpy.zeros(total_size)
mc_weight_samples[:len(found_weights)] = found_weights

# Calculate Monte Carlo statistics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation of the calculation in the previous calculation has been removed and replaced by a generic non-informative comment. Please reinstate the previous comment.


if limits_param == 'distance':
mc_norm = sum(all_weights)
mc_norm = sum(numpy.concatenate((found_weights, missed_weights)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If numpy.concatenate is inefficient, why is it being used here?

# Calculate Monte Carlo statistics
mc_sum = numpy.sum(mc_weight_samples)
mc_sum_squares = numpy.sum(mc_weight_samples ** 2)
mc_sample_variance = (mc_sum_squares / len(mc_weight_samples) -
Copy link
Contributor

@tdent tdent May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calculation does not seem to reproduce the previous calculation using 'Ninj' in all cases. In addition, len(mc_weight_samples) is just the same as total_size so I don't see why the existing variable is not used.

Copy link
Contributor

@tdent tdent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see what is the overall strategy here. In one place a concatenate operation has been removed but the operation is added in more than one other place. Arbitrary changes from ** to numpy.power which should have no effect on performance are made. In addition several nontrivial comments have been removed.

@tdent
Copy link
Contributor

tdent commented May 14, 2025

What specific example function call was made to obtain the advertised speedups?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants