Skip to content

How to specify/drop missing values when using ibis-framework[duckdb]? #2131

@90degs2infty

Description

@90degs2infty

Question about pandera

I don't quite fully understand when pandera considers a row to contain a null value and drops it during validation. I.e. in the below example, why is the row containing None (None being the way to specify nulls in ibis as clarified in ibis-project/ibis#11602) contained in the eventual output? From how I read this part of the docs and this part of the docs, I would expect the code to either raise or drop the row. Can someone please clarify this to me a little? 🙏

import ibis
import pandera.ibis as pa

pa_schema = pa.DataFrameSchema({
    "a": pa.Column(int),
    "b": pa.Column(int),
}, drop_invalid_rows=True)

schema_nullable = ibis.schema(
    {"a": ibis.dtype("int"), "b": ibis.dtype("int")}
)

table = ibis.memtable([{"a": 42, "b": 43}, {"a": 44, "b": None}], schema=schema_nullable)
validated = pa_schema.validate(table, lazy=True)
print("Read with `schema_nullable`, validated against `pa_schema`")
print(validated.execute())

results in

> uv run main.py
Read with `schema_nullable`, validated against `pa_schema`
    a     b
0  42  43.0
1  44   NaN
  • pandera: 0.26.1
  • ibis: 10.8.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions