Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Shared-memory from_arrow no longer zero-copy #8639

Closed
bryevdv opened this issue Jul 1, 2021 · 9 comments
Closed

[BUG] Shared-memory from_arrow no longer zero-copy #8639

bryevdv opened this issue Jul 1, 2021 · 9 comments
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@bryevdv
Copy link

bryevdv commented Jul 1, 2021

cc @trxcllnt @galipremsagar

Looking into implementing zero-copy from_arrow for node-rapids for Arrow arrays stored in IPC format in shared memory, we noticed that Python cudf from_arrow delegates to libcudf.interop.from_arrow which will always result in a copy. Evidently previously it was possible to avoid a copy when and arrow table had been created via GpuArrowReader from GPU memory.

@bryevdv bryevdv added Needs Triage Need team to review and classify bug Something isn't working labels Jul 1, 2021
@beckernick
Copy link
Member

beckernick commented Jul 2, 2021

@bryevdv it looks like we've been delegating to from_arrow for quite a while based on the git blame history (two separate hyperlinks). Is this a recent change in behavior?

@beckernick beckernick added Python Affects Python cuDF API. question Further information is requested and removed Needs Triage Need team to review and classify labels Jul 2, 2021
@trxcllnt
Copy link
Contributor

trxcllnt commented Jul 2, 2021

@beckernick no, the change happened when we switched Python to using libcudf.from_arrow. Previously all the from_arrow logic was handled in Python, and we zero-copy moved the Arrow buffers into cuDF buffers via buffers_from_pyarrow.

@beckernick beckernick removed the question Further information is requested label Aug 25, 2021
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@bryevdv
Copy link
Author

bryevdv commented Nov 23, 2021

Still needs to be addressed, AFAIK.

@shwina
Copy link
Contributor

shwina commented Nov 23, 2021

Is it not possible for C++ to also zero copy similarly where applicable? The reason we moved to the C++ impl was to avoid duplicates logic between C++ and Python.

@trxcllnt
Copy link
Contributor

@shwina yeah it would be possible, but the current implementation returns a unique_ptr<cudf::table>, meaning it must copy the input data so the return can be a valid owning type. For true zero-copy, we'd need a version that returns a cudf::table_view (from which a cudf::table could be constructed if a copy was desired).

@shwina
Copy link
Contributor

shwina commented Nov 30, 2021

cc @jrhemstad for advice on the API here, but maybe we could return something like a pair<table_view, memory_owner>? In the IPC case, the memory_owner would own nothing?

@vyasr
Copy link
Contributor

vyasr commented May 11, 2024

I think the new solutions to this issue will come from work on #14926.

@vyasr
Copy link
Contributor

vyasr commented Aug 15, 2024

With the closing of #14926 this is resolved. We now use the C Data Interface for all Arrow interop, so no extraneous copies will be made.

@vyasr vyasr closed this as completed Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

5 participants