Skip to content

[FEA] Produce and Consume ArrowDeviceArray struct from cudf::table / cudf::column #14926

Closed
@zeroshade

Description

@zeroshade

Is your feature request related to a problem? Please describe.
I would like to generate Arrow IPC payloads from a cudf::table without copying the data off of the GPU device. Currently the to_arrow and from_arrow functions explicitly perform copies to and from the GPU device. There is not currently any efficient way to generate Arrow IPC payloads from libcudf without copying all of the data off of the device.

Describe the solution you'd like
In addition to the existing to_arrow and from_arrow functions, we could have a to_arrow_device_arr function that populates an ArrowDeviceArray struct from a cudf::table or cudf::column. We'd also create a from_arrow_device_arr function that could construct a cudf::table / cudf::column from an ArrowDeviceArray that describes Arrow data which is already on the device. Once you have the ArrowDeviceArray struct, the Arrow C++ library itself can be used to generate the IPC payloads without needing to copy the data off the device. This would also increase the interoperability options that libcudf has, as anything which produces or consumes ArrowDeviceArray structs could hand data off to libcudf and vice versa.

Describe alternatives you've considered
An alternative would be to implement Arrow IPC creating inside of the libcudf library, but I saw that this was explicitly removed from libcudf due to the requirement of linking against libarrow_cuda.so. (#10994). Implementing conversions to and from ArrowDeviceArray wouldn't require linking against libarrow_cuda.so at all and would provide an easy way to allow any consumers to create Arrow IPC payloads, or whatever else they want to do with the resulting Arrow data. Such as leveraging CUDA IPC with the data.

Additional context
When designing the ArrowDeviceArray struct, I created https://github.com/zeroshade/arrow-non-cpu as a POC which used Python numba to generate and operate on some GPU data before handing it off to libcudf, and then getting it back without copying off the device. Using ArrowDeviceArray as the way it handed the data off.

More recently I've been working on creating a protocol for sending Arrow IPC data that is located on GPUs across high-performance transports like UCX. To this end, I created a POC using libcudf to pass the data. As a result I have a partial implementation of the to_arrow_device_arr which can be found here. There's likely better ways than what I'm doing in there, but at least for my POC it was working.

The contribution guidelines say I should file this issue first for discussion rather than just submitting a PR, so that's where I'm at. I plan on trying to create a full implementation that I can contribute but wanted to have this discussion and get feedback here first.

Thanks for hearing me out everyone!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions