-
Notifications
You must be signed in to change notification settings - Fork 1k
Add fast paths for DataFrame.to_cupy
#18801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fast paths for DataFrame.to_cupy
#18801
Conversation
|
@Matt711 Can do you some benchmarks on large-ish square tables? Like 1000x1000 or 10,000x10,000. |
|
bdice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple small fixes, then LGTM.
|
/merge |
| return _index_from_data(data, self.name) | ||
|
|
||
| @_performance_tracking | ||
| def to_pylibcudf(self, copy=False) -> tuple[plc.Column, dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs failed saying Index.to_pylibcudf was missing. I went ahead and added both to_plc and from_plc in this PR. They are almost identical to Series.from/to_plc. In a follow up PR, I can centralize these methods since both Index and Series are SingleColumnFrames

Description
Contributes to #16483 by adding fast paths to
DataFrame.to_cupy(which is called whenDataFrame.valuesis called). The PR follows up #18450 to add cython bindings forcudf::table_to_arrayto pylibcudf and plumbs those changes through cudf classic.I benchmarked the fast (True) and slow (False) when the dataframe has 1, 6, 20, and 100 columns. The fast paths use

cudf::table_to_arrayif the number of columns is greater than 1 andcp.asarraydirectly if the dataframe has only one column. The slow path uses a raw python loop + assignment to create the cupy array.I used the median because the CUDA overhead of calling

cudf::table_to_arrayis large (so there are outliers in the times). Here is a profile of callingto_cupytwice for both the slow and fast paths.In the first calls, the fast path takes 7.3 ms vs 4.8 ms for the slow path. The first call to
cudf::table_to_arrayis the bottleneck. But if you compare the second calls, the fast path is much faster (79 us vs 2.3ms)Checklist