Skip to content

Support writing a table with partition column(s) of type LargeUtf8 #19939

@paleolimbot

Description

@paleolimbot

Is your feature request related to a problem or challenge?

The freshly released Pandas 3.0 has a new string dtype that defaults to LargeUtf8 when converted to Arrow. There may be a few things this triggers with respect to LargeUtf8 view support; however the one that caused a failing test for us was support for LargeUtf8 when writing Parquet with partitions ( apache/sedona-db#538 ). The error is it is not yet supported to write to hive partitions with datatype LargeUtf8.

Describe the solution you'd like

I think it would be fairly easy to add a branch here to support it. I'm happy to do this.

DataType::Utf8 => {
let array = as_string_array(col_array)?;
for i in 0..rb.num_rows() {
partition_values.push(Cow::from(array.value(i)));
}
}
DataType::Utf8View => {
let array = as_string_view_array(col_array)?;
for i in 0..rb.num_rows() {
partition_values.push(Cow::from(array.value(i)));
}
}

Describe alternatives you've considered

Pandas should also probably consider just sticking with Utf8 as the default conversion to arrow (there are probably also other places/libraries that haven't supported this all the way yet).

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions