Skip to content

geom_bar errors with fill aesthetic for categoricals with missing values #945

Open
@machow

Description

@machow

I think the crux is that Categoricals represent missing values with the builtin nan, which seems to mess up a numpy function. However, it seems to work okay when using geom_bar() w/o fill specified on the categorical.

import pandas as pd
from plotnine import *

df =pd.DataFrame({"x": ["a", "a", None, "b"]})

# works: not a categorical
ggplot(df, aes("x", fill="x")) + geom_bar()

# works: no fill aesthetic
ggplot(df, aes("x")) + geom_bar()

# fails: TypeError: '<' not supported between instances of 'float' and 'str'
ggplot(
    df.assign(x = pd.Categorical(df["x"])),
    aes("x", fill="x")
) + geom_bar()

edit: I'm happy to research more, but it also seems like inputting a categorical changes the null handling behaviors in plots (presumably since in one case it's None (or similar) and the other it's nan). Note that Polars can represent nulls in categoricals so deviates from pandas a bit (but going from Polars categorical .to_pandas() will convert nulls to nan right now)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions