Possible bug in impact calculation #933

aleeciu · 2024-08-05T17:24:19Z

Describe the bug
I noticed a weird behavior in the impact calculation. Inputting a valid exposure, hazard and impact function set leads to:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

the error occurs at this line, i.e. when looping the impact matrix generator. After spending few hours, I still couldn't figure out what conditions lead to the error.

The code below reproduces it using one exposure point, one centroid and two hazard events. Interestingly enough, using an exposure with twice the same point makes it run just fine.

To Reproduce
Run the code below.

Code example:

import numpy as np
import pandas as pd
from scipy import sparse
import geopandas as gpd
from shapely.geometry import Point

from climada.entity import Exposures
from climada.entity.impact_funcs import ImpactFuncSet, ImpfTropCyclone
from climada.engine import ImpactCalc
from climada.hazard import Centroids, Hazard

intensity = np.array([[31.5], [19.0]])
centroids = Centroids.from_lat_lon([-26.16], [28.20])

tc = Hazard(
    intensity=sparse.csr_matrix(intensity),
    event_id=np.arange(2),
    event_name=[0,1],
    frequency=np.ones(2) / 2,
    fraction=sparse.csr_matrix(np.zeros((2,1))),
    date=np.array([736018, 736019]),
    centroids=centroids,
    haz_type='TC'
)

impf_tc = ImpfTropCyclone.from_emanuel_usa()
impf_set = ImpactFuncSet([impf_tc])
impf_set.check()

exposure = gpd.GeoDataFrame(
    {'value': 1., 'geometry': [Point(28.22, -26.17)]}, crs="EPSG:4326"
)
exposure['latitude'] = exposure.geometry.y
exposure['longitude'] = exposure.geometry.x
exposure['impf_TC'] = 1

exp = Exposures(exposure)
#exp = Exposures(pd.concat([exposure, exposure]).reset_index()) # Use this and it works

impact = ImpactCalc(exp, impf_set, tc).impact(save_mat=True)

Expected behavior
The expected behaviour would be to get an impact object, the actual result is:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

Climada Version: 4.1.2.dev0

System Information (please complete the following information):

Operating system and version: macOS Sonoma 14.6
Python version: 3.11.6

Any idea/clue?

chahank · 2024-08-06T08:22:43Z

Given the error, I would first try to have a correctly made Hazard and maybe this solves the problem. event_name should be a list, not a numpy array. Otherwise, I do not see directly what would be the cause.

aleeciu · 2024-08-06T09:21:34Z

Thanks - I now specified event_name as a list and get the same, as expected.

chahank · 2024-08-06T09:50:23Z

Ok, thanks for checking. Could you please try with an exposure with at least 2 locations?

aleeciu · 2024-08-06T13:42:19Z

That's a bit the whole point - with two exposure points (essentially duplicating the single exposure point) it then works. See bug description and commented line in the example code.

chahank · 2024-08-06T14:13:44Z

Can you please check that your exposures are correctly initialized? Geopandas does not like it that much if one uses a single data point and that has for me led to failures in the past (but this is maybe not the case anymore).

aleeciu · 2024-08-06T14:35:17Z

What do you mean by correctly initialized? Anyway, it is correct as far as I can tell. Also, this is a code that replicates the error but I got it in different contexts where the exposure was much larger. I don't think it's a geopandas issue, I think it's really about the looping of mat objects that takes place within np.hstack at the above mentioned line that creates the issue. For some reason, dimensions don't fit. I think it has to do with having a unique exposure point being analysed in the impact calculation (the same error is thrown if one has a multi-points exposure, but where only one point has centr_TC != -1, thereby only one is considered in the mininal exp_gdf) but cannot really figure out why.

peanutfun · 2024-08-07T13:19:10Z

@aleeciu @chahank I debugged the issue and found the culprit digging through the source code: csr_matrix.multiply behaves differently when multiplying with a scalar. For scalar multiplication (which also applies for arrays of shape (1, 1) and (1, ), as in @aleeciu's case), the data attribute is simply multiplied with the scalar. In your particular example, this leaves one value of the resulting impact matrix as zero. This zero is explicitly stored. The explicit zero leads to a mismatch between the shape of mat.data and mat.nonzero().

To be clear, this issue only occurs if BOTH

The exposure only has a single value
The MDR and/or fraction is zero somewhere

For sparse array multiplication, this will not happen, as a new matrix constructor is called and explicit zeros are removed.

One remedy is to explicitly call eliminate_zeros on the impact matrix in the imp_mat_gen. However, this incurs a performance penalty of O(n), where n are the stored values, because each single one has to be checked for a value of zero.

aleeciu · 2024-08-08T09:29:25Z

Thanks a lot @peanutfun for investing the bug. Your explanation makes a lot of sense. Regarding the solution, I don't have strong opinions tbh. This is definetely an edge case so probably it does not make sense if the fix deteriorates the overall performance, although the impact quantification in my experience has never been a bottleneck in performance.

peanutfun · 2024-08-08T11:00:28Z

I think it should be fine to add a check whether the size of the exposure is 1, and only then call eliminate_zeros.

aleeciu · 2024-08-08T11:13:36Z

Cool - at what place in the code you think this check should be placed?

peanutfun · 2024-08-08T14:05:43Z

@aleeciu Inside impact_matrix:

climada_python/climada/engine/impact_calc.py

Line 373 in e4a3cfd

def impact_matrix(self, exp_values, cent_idx, impf):

The check has to be if the shape of exp_values_csr is (1, 1). Instead of immediately returning the multiplication result, one then has to apply eliminate_zeros on it before returning.

Edit: Check can also be if n_exp_pnt == 1

chahank · 2024-08-08T14:15:55Z

Thanks a lot @peanutfun for investing the bug. Your explanation makes a lot of sense. Regarding the solution, I don't have strong opinions tbh. This is definetely an edge case so probably it does not make sense if the fix deteriorates the overall performance, although the impact quantification in my experience has never been a bottleneck in performance.

Just to comment on the last point: for most non-trivial applications such as uncertainty quantification, calibration or cost-benefit analysis the impact quantification is the main bottleneck in performance, so it is very central to not loose time there.

peanutfun · 2024-08-08T15:12:59Z

@chahank Correct, but if n_exp_pnt == 1 should be acceptable, I think

peanutfun · 2024-08-09T08:41:55Z

I thought of an even better solution, I hope:

We should call eliminate_zeros on the matrices created in Hazard.get_mdr and Hazard._get_fraction before returning them. This avoids multiplications with zeros, which later on would be eliminated anyway. Then we don't need an additional check in ImpactCalc. In theory, this should actually improve performance a bit.

peanutfun · 2024-08-14T08:05:57Z

@aleeciu would you have the time to create a PR?

aleeciu · 2024-08-14T13:32:31Z

Hi @peanutfun , yes we will address this within the next two weeks. If you know from the top of your head where this call should likely happen in the code, please share it - otherwise I will make a first guess and we iterate over the PR.

peanutfun · 2024-08-14T15:48:35Z

@aleeciu Excellent, looking forward to it!

Probably within here:

climada_python/climada/hazard/base.py

Line 1073 in e4a3cfd

def get_mdr(self, cent_idx, impf):

~~and here:~~

climada_python/climada/hazard/base.py

Line 1138 in e4a3cfd

def _get_fraction(self, cent_idx=None):

Edit: It's not required in get_fraction, because the fraction matrix is only sliced, and by calling Hazard.check_matrices from ImpactCalc, it should not contain explicit zeros anymore.

aleeciu added the bug label Aug 5, 2024

peanutfun added the accepting pull request Contribute by raising a pull request to resolve this issue! label Aug 8, 2024

bguillod mentioned this issue Sep 17, 2024

Bug fix in Hazard.get_mdr and impact calculation in case of a single exposure point and MDR values of 0 #948

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in impact calculation #933

Possible bug in impact calculation #933

aleeciu commented Aug 5, 2024 •

edited

Loading

chahank commented Aug 6, 2024

aleeciu commented Aug 6, 2024

chahank commented Aug 6, 2024

aleeciu commented Aug 6, 2024

chahank commented Aug 6, 2024

aleeciu commented Aug 6, 2024 •

edited

Loading

peanutfun commented Aug 7, 2024 •

edited

Loading

aleeciu commented Aug 8, 2024

peanutfun commented Aug 8, 2024

aleeciu commented Aug 8, 2024

peanutfun commented Aug 8, 2024 •

edited

Loading

chahank commented Aug 8, 2024

peanutfun commented Aug 8, 2024

peanutfun commented Aug 9, 2024

peanutfun commented Aug 14, 2024

aleeciu commented Aug 14, 2024

peanutfun commented Aug 14, 2024 •

edited

Loading

Possible bug in impact calculation #933

Possible bug in impact calculation #933

Comments

aleeciu commented Aug 5, 2024 • edited Loading

chahank commented Aug 6, 2024

aleeciu commented Aug 6, 2024

chahank commented Aug 6, 2024

aleeciu commented Aug 6, 2024

chahank commented Aug 6, 2024

aleeciu commented Aug 6, 2024 • edited Loading

peanutfun commented Aug 7, 2024 • edited Loading

aleeciu commented Aug 8, 2024

peanutfun commented Aug 8, 2024

aleeciu commented Aug 8, 2024

peanutfun commented Aug 8, 2024 • edited Loading

chahank commented Aug 8, 2024

peanutfun commented Aug 8, 2024

peanutfun commented Aug 9, 2024

peanutfun commented Aug 14, 2024

aleeciu commented Aug 14, 2024

peanutfun commented Aug 14, 2024 • edited Loading

aleeciu commented Aug 5, 2024 •

edited

Loading

aleeciu commented Aug 6, 2024 •

edited

Loading

peanutfun commented Aug 7, 2024 •

edited

Loading

peanutfun commented Aug 8, 2024 •

edited

Loading

peanutfun commented Aug 14, 2024 •

edited

Loading