[FEA] Produce and Consume ArrowDeviceArray struct from cudf::table / cudf::column

**Is your feature request related to a problem? Please describe.**
I would like to generate Arrow IPC payloads from a `cudf::table` without copying the data off of the GPU device. Currently the `to_arrow` and `from_arrow` functions explicitly perform copies to and from the GPU device. There is not currently any efficient way to generate Arrow IPC payloads from libcudf without copying all of the data off of the device.

**Describe the solution you'd like**
In addition to the existing `to_arrow` and `from_arrow` functions, we could have a `to_arrow_device_arr` function that populates an `ArrowDeviceArray` struct from a `cudf::table` or `cudf::column`. We'd also create a `from_arrow_device_arr` function that could construct a `cudf::table` / `cudf::column` from an `ArrowDeviceArray` that describes Arrow data which is already on the device. Once you have the `ArrowDeviceArray` struct, the Arrow C++ library itself can be used to generate the IPC payloads without needing to copy the data off the device. This would also increase the interoperability options that libcudf has, as anything which produces or consumes `ArrowDeviceArray` structs could hand data off to libcudf and vice versa.

**Describe alternatives you've considered**
An alternative would be to implement Arrow IPC creating inside of the libcudf library, but I saw that this was explicitly removed from libcudf due to the requirement of linking against `libarrow_cuda.so`. (https://github.com/rapidsai/cudf/issues/10994). Implementing conversions to and from `ArrowDeviceArray` wouldn't require linking against `libarrow_cuda.so` at all and would provide an easy way to allow any consumers to create Arrow IPC payloads, or whatever else they want to do with the resulting Arrow data. Such as leveraging CUDA IPC with the data.

**Additional context**
When designing the `ArrowDeviceArray` struct, I created https://github.com/zeroshade/arrow-non-cpu as a POC which used Python numba to generate and operate on some GPU data before handing it off to libcudf, and then getting it back without copying off the device. Using `ArrowDeviceArray` as the way it handed the data off.

More recently I've been working on creating a protocol for sending Arrow IPC data that is located on GPUs across high-performance transports like UCX. To this end, I created a POC using libcudf to pass the data. As a result I have a partial implementation of the `to_arrow_device_arr` which can be found [here](https://github.com/zeroshade/cudf-flight-ucx/blob/main/to_arrow.cc). There's likely better ways than what I'm doing in there, but at least for my POC it was working.

The contribution guidelines say I should file this issue first for discussion rather than just submitting a PR, so that's where I'm at. I plan on trying to create a full implementation that I can contribute but wanted to have this discussion and get feedback here first. 

Thanks for hearing me out everyone!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Produce and Consume ArrowDeviceArray struct from cudf::table / cudf::column #14926

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Produce and Consume ArrowDeviceArray struct from cudf::table / cudf::column #14926

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions