-
Notifications
You must be signed in to change notification settings - Fork 1k
Add a public API for copying a table_view to device array #18450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a public API for copying a table_view to device array #18450
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
bdice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial feedback attached. Thanks for your work on this!
Benchmark Resultstable_to_array[0] NVIDIA RTX 5880 Ada Generation
[1] NVIDIA RTX 5880 Ada Generation
|
|
What is the common functionality between this and |
bdice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments -- overall this is close!
I looked over the splitting/copying APIs. This is similar to |
vuule
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few places where the code could be simplified, otherwise looks great!
vuule
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for addressing the comments!
few more things and it should be good to go.
cpp/include/cudf/reshape.hpp
Outdated
| */ | ||
| void table_to_array(table_view const& input, | ||
| device_span<cuda::std::byte> output, | ||
| cudf::data_type output_dtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we avoid passing this and derive this type from the column types, since we throw on mismatch anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done b71ec3a
vuule
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
|
Thanks all! |
|
/merge |
7e555e0
into
rapidsai:branch-25.06
Contributes to #16483 by adding fast paths to `DataFrame.to_cupy` (which is called when `DataFrame.values` is called). The PR follows up #18450 to add cython bindings for `cudf::table_to_array` to pylibcudf and plumbs those changes through cudf classic. I benchmarked the fast (True) and slow (False) when the dataframe has 1, 6, 20, and 100 columns. The fast paths use `cudf::table_to_array` if the number of columns is greater than 1 and `cp.asarray` directly if the dataframe has only one column. The slow path uses a [raw python loop + assignment](https://github.com/rapidsai/cudf/blob/35d58394e7fb5a090ff3cda351403ec092476af5/python/cudf/cudf/core/frame.py#L520) to create the cupy array.  I used the median because the CUDA overhead of calling `cudf::table_to_array` is large (so there are outliers in the times). Here is a profile of calling `to_cupy` twice for both the slow and fast paths.  In the first calls, the fast path takes 7.3 ms vs 4.8 ms for the slow path. The first call to `cudf::table_to_array` is the bottleneck. But if you compare the second calls, the fast path is much faster (79 us vs 2.3ms) Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) - Matthew Roeschke (https://github.com/mroeschke) URL: #18801
Description
Contributes to #16483.
This PR adds a new libcudf API:
cudf::table_to_array, which copies data from a table_view into a preallocated column-major device array usingcub::DeviceMemcpy::Batched.The primary use case for this API is to accelerate the conversion of a cudf.DataFrame to a CuPy array when users access
DataFrame.valuesin Python.In a follow-up PR, I'll integrate this API into the cudf Python layer.