-
Notifications
You must be signed in to change notification settings - Fork 133
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When creating a Python/Pyarrow UDF, extension types and arrays aren't propagated from the output of one to the input of another.
To Reproduce
from uuid import UUID
import datafusion
import pyarrow as pa
@datafusion.udf([pa.string()], pa.uuid(), "stable")
def uuid_from_string(uuid_string):
return pa.array((UUID(s).bytes for s in uuid_string.to_pylist()), pa.uuid())
@datafusion.udf([pa.uuid()], pa.int64(), "stable")
def uuid_version(uuid):
return pa.array(s.version for s in uuid.to_pylist())
def main():
ctx = datafusion.SessionContext()
batch = pa.record_batch({"idx": pa.array(range(100))})
tab = (
ctx.create_dataframe([[batch]])
.with_column("uuid_string", datafusion.functions.uuid())
.with_column("uuid", uuid_from_string(datafusion.col("uuid_string")))
.with_column("uuid_version", uuid_version(datafusion.col("uuid")))
)
#> AttributeError("'bytes' object has no attribute 'version'"), since metadata doesn't make it through
print(tab)
if __name__ == "__main__":
main()Expected behavior
The pyarrow.Array that is returned from uuid_from_string() is a UuidArray:
pa.array([uuid4().bytes], pa.uuid())
#> <pyarrow.lib.UuidArray object at 0x120292350>However, the pyarrow.Array that is passed to uuid_version() is a FixedSizeBinary array. I would have expected the array passed here to have the pa.uuid() type.
Additional context
It seems like create_udf() is the mechanism being used to create the UDF; however, this doesn't propagate field information I believe since everything goes through the DataType:
Lines 91 to 105 in 9545634
| fn new( | |
| name: &str, | |
| func: PyObject, | |
| input_types: PyArrowType<Vec<DataType>>, | |
| return_type: PyArrowType<DataType>, | |
| volatility: &str, | |
| ) -> PyResult<Self> { | |
| let function = create_udf( | |
| name, | |
| input_types.0, | |
| return_type.0, | |
| parse_volatility(volatility)?, | |
| to_scalar_function_impl(func), | |
| ); | |
| Ok(Self { function }) |
kylebarron
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working