-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge?
The freshly released Pandas 3.0 has a new string dtype that defaults to LargeUtf8 when converted to Arrow. There may be a few things this triggers with respect to LargeUtf8 view support; however the one that caused a failing test for us was support for LargeUtf8 when writing Parquet with partitions ( apache/sedona-db#538 ). The error is it is not yet supported to write to hive partitions with datatype LargeUtf8.
Describe the solution you'd like
I think it would be fairly easy to add a branch here to support it. I'm happy to do this.
datafusion/datafusion/datasource/src/write/demux.rs
Lines 394 to 405 in 9f27e93
| DataType::Utf8 => { | |
| let array = as_string_array(col_array)?; | |
| for i in 0..rb.num_rows() { | |
| partition_values.push(Cow::from(array.value(i))); | |
| } | |
| } | |
| DataType::Utf8View => { | |
| let array = as_string_view_array(col_array)?; | |
| for i in 0..rb.num_rows() { | |
| partition_values.push(Cow::from(array.value(i))); | |
| } | |
| } |
Describe alternatives you've considered
Pandas should also probably consider just sticking with Utf8 as the default conversion to arrow (there are probably also other places/libraries that haven't supported this all the way yet).
Additional context
No response