Incorporating "objective importance" weights in qEHVI #1004

nathanohara · 2021-11-30T00:37:12Z

nathanohara
Nov 30, 2021

Hi all,

Currently, I want to perform a multi-objective Bayesian optimization where I have a set of weights associated with the importance of each objective. For example, say I have 8 objectives, and the final two objectives are four times as important as each of the rest. I am given a set of objective weights [1, 1, 1, 1, 1, 1, 4, 4].

A traditional approach would be to compute a scalarization of the objectives using this weight vector. At the same time, I've enjoyed using qEHVI, since the scalarization approach might not explore the entire Pareto front as well, and it may exploit to generate candidates that aren't Pareto optimal. However, with qEHVI, I'd still like to prioritize points in the region of the Pareto front where the heavily weighted objectives are more dominant.

The approach I am currently taking is to "scale the axes" with an MCMultiOutputObjective. My Objective is normalizing the model outputs to [0,1], then multiplying each output by its associated weight. I am hoping here that since the "more important" axes have a 4x larger magnitude, points that have higher values in these objectives will contribute more hypervolume and therefore be prioritized.

In practice, I am noticing that the results using this Objective are essentially identical to when I simply normalize the outputs without rescaling them based on weights.

Is there something theoretically incorrect about my approach? Otherwise, is there another way to achieve this kind of behavior? I have been reading the source code and have not found code internal to qEHVI that re-normalizes the outcomes, but it's possible that I missed that too. Thanks for any pointers

Answered by sdaulton

Nov 30, 2021

Cool problem! I have always been interested in this, but I have never had a use case for weighting the objectives. 8 objectives would be very computationally intensive with any HVI-based acquisition function using exact HV since HV computation has super-polynomial complexity in the number of objectives. You could use approximate box decompositions (e.g. NondominatedPartitioning with alpha>0) for faster computation if the number objectives is > 5.

Regarding weighting:

(1) I would normalize but not weight the pareto frontier over the previously evaluated points. Since the volume of a hyperrectangle is defined by an upper vertex u and a lower vertex v (assuming u_m > v_m for all dimensions m…

View full answer

Balandat · 2021-11-30T02:32:44Z

Balandat
Nov 30, 2021
Collaborator

In practice, I am noticing that the results using this Objective are essentially identical to when I simply normalize the outputs without rescaling them based on weights.

How are you comparing these results? Are you looking at the progression of the Pareto Frontier over time (with the number of samples)? Asymptotically (i.e. in the large sample regime), since the Pareto Front in the parameter space is invariant under scaling of the outcomes you should expect this behavior. During a transient exploration phase this may or may not make a difference - my current thinking is that that it will have an effect on the immediate order in which points are being explored, but in most cases that is probably just washed out by qEHVI exploring the PF relatively uniformly, but I'll have to think a bit more about this.

@sdaulton probably also has some thoughts here.

1 reply

nathanohara Nov 30, 2021
Author

Great question, I definitely should have mentioned how I'm monitoring this in the original post. I am looking at two metrics to compare the results: (1.) progression of the Pareto frontier and (2.) best observed scalarized value over time, measured as the weighted sum of objectives.

My expectation is that both the unscaled and scaled EHVI approaches will achieve the same results with respect to the progression of the Pareto frontier, but that the scaled approach will more quickly achieve optimal results on the scalarized value. This approach is motivated by how different sets of weights, when directly optimizing a scalarized objective, target different regions of the Pareto front.

sdaulton · 2021-11-30T03:16:24Z

sdaulton
Nov 30, 2021
Collaborator

Cool problem! I have always been interested in this, but I have never had a use case for weighting the objectives. 8 objectives would be very computationally intensive with any HVI-based acquisition function using exact HV since HV computation has super-polynomial complexity in the number of objectives. You could use approximate box decompositions (e.g. NondominatedPartitioning with alpha>0) for faster computation if the number objectives is > 5.

Regarding weighting:

(1) I would normalize but not weight the pareto frontier over the previously evaluated points. Since the volume of a hyperrectangle is defined by an upper vertex u and a lower vertex v (assuming u_m > v_m for all dimensions m). The volume of a rectangle is prod_m(u_m - v_m). If both u and v are multiplied by a non-negative weight vector w, then the volume of the rectangle is prod_m(w_m*u_m - w_m *v_m) = prod_m w_m * prod_m (u_m - v_m). So the volume of the weighted rectangle is proportional to the volume of the unweighted rectangle. In terms of selecting candidates, this means that the weighting has no effect. Additionally, you should swap the weights (small weights for the most important objectives and large weights for the least important objectives) because HV is multiplicative w.r.t to the objectives, so HV is dependent on the smallest (weighted) objective value.

E.g. Suppose the reference point (1,1) and there are no current pareto points and you are considering two new points with objectives A=(2,3) and B=(3,2) with a weight vector (1,4). Then the unweighted HVI is 2 for both points. If both the reference point and the new points are weighted, then the weighted HV is 8 for both points.

Now suppose only u is weighted, then the volume of the rectangle is prod_m(w_m*u_m - v_m). So in the example above, the weighted HV is (2*1-1)*(3*4-1)=11 for A and (3*1-1) * (2*4 - 1) = 14 for B.

Now suppose the weight vector is (4,1). Then the weighted HV is (2*4-1)*(3*1-1) = 14 for A and (3*4-1) * (2*1 - 1) = 11 for B. This is the desired outcome: point B which is better w.r.t to the second objective gets higher acquisition value.

This plot shows the difference in HVI value across a 2-objective space for two different weight vectors. This shows how the weight values should be inverted to have the desired effect on the HVI.

(2) One issue with weighting the outcomes is that values that are worse than the reference value can actually become better than the reference value after weighting. E.g. if the ref value for objective 1 is 1 and the outcome value is 0.5 and the weight is 3, the the weighted outcome is 1.5, which is better than the reference value. However, in this case we would still not want the this point to have positive HVI (after weighting), since it is worse than the reference point. Hence, we need to multiply the weighted outcomes by a component-wise indicator function \mathbbm{1}(y > ref_point). Since the indicator function is not differentiable, one could approximate it using a sigmoid.

(3) Is your reference point the origin? You'll want the reference point and all objectives to be strictly positive because of (1)---v_m being zero means that the weighting has no effect.

Notebook:
weighted_qehvi.ipynb.txt

1 reply

nathanohara Nov 30, 2021
Author

Thank you so much for the thoughtful response -- this is very insightful!

A-ep93 · 2021-12-11T06:04:51Z

A-ep93
Dec 11, 2021

Hello @nathanohara. Is it possible for you to post a sample code to show how you apply the weights with the procedure you mentioned in qEHVI? Or can you please let me know what changes should I apply to this tutorial if I want to consider weighing in qEHVI?

1 reply

nathanohara Dec 11, 2021
Author

Hello -- unfortunately I can't share code right now, but I just followed the logic of sdaulton's comment above. In the end it actually ended up significantly degrading the performance of EHVI compared to the unmodified version with respect to expansion of the Pareto front. In the future I'm planning on investigating why I observed this but for now it's not one of my top priorities so I have sidelined it.

The relationship between the weights and the hypervolume computed ends up very nontrivial when you actually work it out based on the equations sdaulton shared. As output dimensionality increases for example the impact of the weights relative to one another changes in an esoteric way, and it's impacted by the choice of reference point a lot.

In total -- I'm sorry I can't help so much with this right now, but I think it's still a good area to continue investigating.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporating "objective importance" weights in qEHVI #1004

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Incorporating "objective importance" weights in qEHVI #1004

nathanohara Nov 30, 2021

Replies: 3 comments · 3 replies

Balandat Nov 30, 2021 Collaborator

nathanohara Nov 30, 2021 Author

sdaulton Nov 30, 2021 Collaborator

nathanohara Nov 30, 2021 Author

A-ep93 Dec 11, 2021

nathanohara Dec 11, 2021 Author

nathanohara
Nov 30, 2021

Replies: 3 comments 3 replies

Balandat
Nov 30, 2021
Collaborator

nathanohara Nov 30, 2021
Author

sdaulton
Nov 30, 2021
Collaborator

nathanohara Nov 30, 2021
Author

A-ep93
Dec 11, 2021

nathanohara Dec 11, 2021
Author