-
-
Notifications
You must be signed in to change notification settings - Fork 361
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Question about pandera
I don't quite fully understand when pandera
considers a row to contain a null
value and drops it during validation. I.e. in the below example, why is the row containing None
(None
being the way to specify null
s in ibis
as clarified in ibis-project/ibis#11602) contained in the eventual output? From how I read this part of the docs and this part of the docs, I would expect the code to either raise
or drop the row. Can someone please clarify this to me a little? 🙏
import ibis
import pandera.ibis as pa
pa_schema = pa.DataFrameSchema({
"a": pa.Column(int),
"b": pa.Column(int),
}, drop_invalid_rows=True)
schema_nullable = ibis.schema(
{"a": ibis.dtype("int"), "b": ibis.dtype("int")}
)
table = ibis.memtable([{"a": 42, "b": 43}, {"a": 44, "b": None}], schema=schema_nullable)
validated = pa_schema.validate(table, lazy=True)
print("Read with `schema_nullable`, validated against `pa_schema`")
print(validated.execute())
results in
> uv run main.py
Read with `schema_nullable`, validated against `pa_schema`
a b
0 42 43.0
1 44 NaN
- pandera:
0.26.1
- ibis:
10.8.0
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested