Skip to content

search_sorted on Categorial and Enum Series fails to work if given a string #20171

@itamarst

Description

@itamarst

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Add the following to test_search_sorted.py:

def test_search_sorted_categorical() -> None:
    # Sorting will be based on order in which entries were added:
    series = pl.Series(["c", "b", "b", "a", "c", "b"], dtype=pl.Categorical).sort()
    series2 = pl.Series(["c", "b", "a"], dtype=series.dtype)
    assert series.search_sorted(series2).to_list() == [0, 2, 5]
    assert series.search_sorted("c") == 0
    assert series.search_sorted("b") == 2
    assert series.search_sorted("a") == 5

def test_search_sorted_enum() -> None:
    E = pl.Enum(["a", "b", "c"])
    series = pl.Series(["c", "b", "b", "a", "c", "b"], dtype=E).sort()
    series2 = pl.Series(["c", "b", "a"], dtype=E)
    assert series.search_sorted(series2).to_list() == [4, 1, 0]
    assert series.search_sorted("c") == 4
    assert series.search_sorted("b") == 1
    assert series.search_sorted("a") == 0

Log output

______________________________________________________________________ test_search_sorted_categorical _______________________________________________________________________tests/unit/operations/test_search_sorted.py:86: in test_search_sorted_categorical
    assert series.search_sorted("c") == 0
polars/series/series.py:3466: in search_sorted
    df = F.select(F.lit(self).search_sorted(element, side))
polars/functions/lazy.py:1952: in select
    return empty_frame.select(*exprs, **named_exprs)
polars/dataframe/frame.py:9275: in select
    return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
polars/lazyframe/frame.py:2030: in collect
    return wrap_df(ldf.collect(callback))
E   polars.exceptions.InvalidOperationError: got invalid or ambiguous dtypes: '[cat, str]' in expression 'search_sorted'
E   
E   Consider explicitly casting your input types to resolve potential ambiguity.
E   
E   Resolved plan until failure:
E   
E       ---> FAILED HERE RESOLVING 'select' <---
E    SELECT [Series.search_sorted([String(c)])] FROM
E     DF []; PROJECT */0 COLUMNS; SELECTION: None
__________________________________________________________________________ test_search_sorted_enum __________________________________________________________________________tests/unit/operations/test_search_sorted.py:96: in test_search_sorted_enum
    assert series.search_sorted("c") == 4
polars/series/series.py:3466: in search_sorted
    df = F.select(F.lit(self).search_sorted(element, side))
polars/functions/lazy.py:1952: in select
    return empty_frame.select(*exprs, **named_exprs)
polars/dataframe/frame.py:9275: in select
    return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
polars/lazyframe/frame.py:2030: in collect
    return wrap_df(ldf.collect(callback))
E   polars.exceptions.InvalidOperationError: got invalid or ambiguous dtypes: '[enum, str]' in expression 'search_sorted'
E   
E   Consider explicitly casting your input types to resolve potential ambiguity.
E   
E   Resolved plan until failure:
E   
E       ---> FAILED HERE RESOLVING 'select' <---
E    SELECT [Series.search_sorted([String(c)])] FROM
E     DF []; PROJECT */0 COLUMNS; SELECTION: None

Issue description

The supertype casting logic doesn't handle casting individual strings to Categoricals or Enums.

Expected behavior

The tests should pass. If a fix is implemented after #19894 is merged, the relevant commented out tests in test_index_of.py should also be uncommented out and pass (or a new issue should be filed covering them specifically).

Installed versions

Git version of polars as of Dec 5, 2024.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageAwaiting prioritization by a maintainerpythonRelated to Python Polars

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions