Making the optimisation process more efficient: Pruning data and switching between acquisition functions. #1387

sambitmishra98 · 2022-09-09T01:35:41Z

sambitmishra98
Sep 9, 2022

I am interested in using KG (Knowledge Gradient) method for optimisation, and I see that the process becomes too expensive beyond around 30 to 50 iterations. At the moment I am switching to EI (Expected Improvement) method when the KG method becomes expensive, but I want to continue using KG method. Towards this, I think pruning the data would help. I think datapoints that are too close by need to be removed. Also, I am not sure if this is already implemented or is it really worth it. So the following are my questions:

Is this idea of switching to EI method from KG
Does pruning the data really help?
If yes, how to I implement it?
When it comes to making KG method lesser expensive (maybe lesser efficient but definitely not as greedy as EI) what are the ways I could keep using KG method for large datasets?

Balandat · 2022-09-15T05:09:41Z

Balandat
Sep 15, 2022
Collaborator

Is this idea of switching to EI method from KG

For this see my comments here.

Does pruning the data really help?

I can't give a definitive answer on this, but I have a strong feeling that rather than throwing data away in order to use a computationally expensive acquisition function you want to use that data to improve the model and instead use a cheaper acquisition function.

When it comes to making KG method lesser expensive (maybe lesser efficient but definitely not as greedy as EI) what are the ways I could keep using KG method for large datasets?

You can try reducing the number of fantasies here. This will reduce the accuracy of the approximation and it will depend on the problem how much of a hit optimization performance will take as a result. But using num_fantasies=32 or num_fantasies=16 rather than the default of 64 is certainly reasonable.

Are you using constraints? If so you can also pass a custom inner_sampler with less than the default 128 MC samples to speed things up.

1 reply

sambitmishra98 Sep 16, 2022
Author

I can't give a definitive answer on this, but I have a strong feeling that rather than throwing data away in order to use a computationally expensive acquisition function you want to use that data to improve the model and instead use a cheaper acquisition function.

For the problem I am working on, I am using a black-box for which a given set of inputs give me an error-included value for cost. The more I sample the black-box with the same candidate, the better estimate for the mean and error in the cost I get. So currently whenever I want to get a cost for a candidate, I sample from the black-box 30 times (For a Gaussian distribution, based on Central Limit Theorum). Now if I sample more number of times, I get a better cost mean and variance to use for my FixedNoiseGP model. I was thinking if I could get a model quickly using the first few candidates, and later on if I do want to improve on the existing points, I would sample more from the black-box and update the cost mean and variance. As to how I would know if the candidate is nearby a previously tested candidate, I would round off the candidates and compare equality.

If I do this, updating the existing dataset would be helpful whichever kind of acquisition function you use. @Balandat What are your thoughts on this strategy?

Are you using constraints?

No, not yet relevant to what I am doing.

You can try reducing the number of fantasies here.

I will try this. I could find many places where I could reduce the compute-time at the expense of a good candidate.

botorch.optim.optimize.optimize_acqf has num_restarts and raw_samples
botorch.acquisition.qKnowledgeGradient has num_fantasies
With a few attempts, I got a the optimisation process to be as fast as I can afford it to be. But I am not sure how it would scale with degrees of freedom, or number of samples.

Balandat · 2022-09-18T17:32:13Z

Balandat
Sep 18, 2022
Collaborator

For the problem I am working on, I am using a black-box for which a given set of inputs give me an error-included value for cost. The more I sample the black-box with the same candidate, the better estimate for the mean and error in the cost I get. So currently whenever I want to get a cost for a candidate, I sample from the black-box 30 times (For a Gaussian distribution, based on Central Limit Theorum). Now if I sample more number of times, I get a better cost mean and variance to use for my FixedNoiseGP model. I was thinking if I could get a model quickly using the first few candidates, and later on if I do want to improve on the existing points, I would sample more from the black-box and update the cost mean and variance. As to how I would know if the candidate is nearby a previously tested candidate, I would round off the candidates and compare equality.

I see, so if you're sampling repeatedly at a point what you'll do is refine the mean and variance estimates, that makes sense. In general that aggregation should ideally happen outside of GP model so that the number of data points stays reasonable and the model is fast enough to evaluate.

This problems is also studied in https://arxiv.org/abs/1710.03206, which would be worth a read.

0 replies

sambitmishra98 · 2023-03-21T15:28:35Z

sambitmishra98
Mar 21, 2023
Author

Say I am alternatingly using KG to get the global optimum and PM to look for optimum of the GP model. Although deleting data is not advised, I have been building the GP model with only the KG candidates that I tested. I have only used PM candidates to test if the optimum is performing better than before. I did this mainly to overcome optimisation wall-time and memory allocation issues, which were reccently addressed in #1704.

Assuming no issues with adding more data in terms of the above bottlenecks (wall-time and memory), is it advisable to have as much data as possible (both PM and KG) to build the model? Will there be any issues with overfitting if I use SingleTaskGP to model the data?

I am working with 100-500 datapoints.

1 reply

Balandat Mar 22, 2023
Collaborator

Assuming no issues with adding more data in terms of the above bottlenecks (wall-time and memory), is it advisable to have as much data as possible (both PM and KG) to build the model?

Yes, more data is good. You shouldn't really run into overfitting issues. But in general it would be prudent to check how good the out-of-sample model fits are (e.g. via cross validation) - KG is quite sensitive to the model quality, so if it's bad then KG may not be the best choice of acquisition function.

Moreover, KG is (as you have observed) a costly acquisition function, and so typically used in low-throughput scenarios where we don't get many data points. If the acquisition function optimization is becoming a bottleneck for you, you may also consider using a less costly acquisition function such as Expected Improvement. It might be worth sacrificing some of the per-sample optimization performance of KG for higher throughput if that allows you to get more samples and build better surrogate models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Making the optimisation process more efficient: Pruning data and switching between acquisition functions. #1387

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Making the optimisation process more efficient: Pruning data and switching between acquisition functions. #1387

Uh oh!

sambitmishra98 Sep 9, 2022

Replies: 3 comments · 2 replies

Uh oh!

Balandat Sep 15, 2022 Collaborator

Uh oh!

Uh oh!

sambitmishra98 Sep 16, 2022 Author

Uh oh!

Balandat Sep 18, 2022 Collaborator

Uh oh!

Uh oh!

sambitmishra98 Mar 21, 2023 Author

Uh oh!

Balandat Mar 22, 2023 Collaborator

sambitmishra98
Sep 9, 2022

Replies: 3 comments 2 replies

Balandat
Sep 15, 2022
Collaborator

sambitmishra98 Sep 16, 2022
Author

Balandat
Sep 18, 2022
Collaborator

sambitmishra98
Mar 21, 2023
Author

Balandat Mar 22, 2023
Collaborator