Skip to content

Mixing of bool + float covariates in sport.region.MaxPHeuristic not allowed #498

@twallema

Description

@twallema

I'm trying to perform a spatial clustering using spot.region.maxPHeuristic. As I mixed attributes of type bool and float64,

biome_Amazônia             bool
biome_Caatinga             bool
biome_Cerrado              bool
biome_Mata Atlântica       bool
biome_Pampa                bool
biome_Pantanal             bool
cx                      float64
cy                      float64
dtype: object

I got the following error stack,

Traceback (most recent call last):
  File "/Users/twa27/Documents/github/DENV-serotype-imputation/scripts/clustering/find-clusters.py", line 104, in <module>
    model.solve()
  File "/Users/twa27/miniforge3/envs/DENV-SEROTYPE-IMPUTATION/lib/python3.12/site-packages/spopt/region/maxp.py", line 836, in solve
    max_p, label = maxp(
                   ^^^^^
  File "/Users/twa27/miniforge3/envs/DENV-SEROTYPE-IMPUTATION/lib/python3.12/site-packages/spopt/region/maxp.py", line 97, in maxp
    distance_matrix = squareform(pdist(attr, metric="cityblock"))
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/twa27/miniforge3/envs/DENV-SEROTYPE-IMPUTATION/lib/python3.12/site-packages/scipy/spatial/distance.py", line 2300, in pdist
    return xpx.lazy_apply(_np_pdist, X, out,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/twa27/miniforge3/envs/DENV-SEROTYPE-IMPUTATION/lib/python3.12/site-packages/scipy/_lib/array_api_extra/_lib/_lazy.py", line 300, in lazy_apply
    out = wrapped(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/twa27/miniforge3/envs/DENV-SEROTYPE-IMPUTATION/lib/python3.12/site-packages/scipy/_lib/array_api_extra/_lib/_lazy.py", line 350, in wrapper
    out = func(*args_list, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/twa27/miniforge3/envs/DENV-SEROTYPE-IMPUTATION/lib/python3.12/site-packages/scipy/spatial/distance.py", line 2338, in _np_pdist
    return pdist_fn(X, out=out, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Unsupported dtype object

Running the analysis with covariates biome_Amazônia -> biome_Pantanal (exclusively type bool) works fine. Converting the bool covariates to float before feeding them to maxPHeuristic solved the error. I'm assuming internally the mix of non-numeric + numeric gets converted to a numpy array of type "object" which causes this function to misbehave.

Suggested fix: Input check on types of covariate columns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions