Add fast paths for `DataFrame.to_cupy` #18801

Matt711 · 2025-05-13T18:45:01Z

Description

Contributes to #16483 by adding fast paths to DataFrame.to_cupy (which is called when DataFrame.values is called). The PR follows up #18450 to add cython bindings for cudf::table_to_array to pylibcudf and plumbs those changes through cudf classic.

I benchmarked the fast (True) and slow (False) when the dataframe has 1, 6, 20, and 100 columns. The fast paths use cudf::table_to_array if the number of columns is greater than 1 and cp.asarray directly if the dataframe has only one column. The slow path uses a raw python loop + assignment to create the cupy array.

I used the median because the CUDA overhead of calling cudf::table_to_array is large (so there are outliers in the times). Here is a profile of calling to_cupy twice for both the slow and fast paths.

In the first calls, the fast path takes 7.3 ms vs 4.8 ms for the slow path. The first call to cudf::table_to_array is the bottleneck. But if you compare the second calls, the fast path is much faster (79 us vs 2.3ms)

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

python/cudf/cudf/core/single_column_frame.py

python/cudf/cudf/core/frame.py

python/pylibcudf/pylibcudf/reshape.pyx

python/pylibcudf/pylibcudf/tests/test_reshape.py

bdice · 2025-05-13T19:15:04Z

@Matt711 Can do you some benchmarks on large-ish square tables? Like 1000x1000 or 10,000x10,000.

Matt711 · 2025-05-14T19:37:13Z

@Matt711 Can do you some benchmarks on large-ish square tables? Like 1000x1000 or 10,000x10,000.

python/cudf/cudf/core/frame.py

python/pylibcudf/pylibcudf/tests/test_reshape.py

python/cudf/cudf/core/frame.py

bdice

A couple small fixes, then LGTM.

python/cudf/benchmarks/common/config.py

python/pylibcudf/pylibcudf/reshape.pyx

Matt711 · 2025-05-15T02:04:02Z

/merge

Matt711 · 2025-05-15T02:11:03Z

python/cudf/cudf/core/index.py

        return _index_from_data(data, self.name)

+    @_performance_tracking
+    def to_pylibcudf(self, copy=False) -> tuple[plc.Column, dict]:


docs failed saying Index.to_pylibcudf was missing. I went ahead and added both to_plc and from_plc in this PR. They are almost identical to Series.from/to_plc. In a follow up PR, I can centralize these methods since both Index and Series are SingleColumnFrames

Matt711 added 4 commits May 13, 2025 13:21

Add fast paths for DataFrame.to_cupy

c539124

clean up

2c298df

more clean up

59cee55

add tests and fix bug

0c92c18

Matt711 requested a review from a team as a code owner May 13, 2025 18:45

Matt711 requested review from TomAugspurger and brandon-b-miller May 13, 2025 18:45

Matt711 added feature request New feature or request non-breaking Non-breaking change labels May 13, 2025

github-actions bot assigned Matt711 May 13, 2025

github-actions bot added Python Affects Python cuDF API. pylibcudf Issues specific to the pylibcudf package labels May 13, 2025

github-project-automation bot added this to cuDF Python May 13, 2025

GPUtester moved this to In Progress in cuDF Python May 13, 2025

Matt711 commented May 13, 2025

View reviewed changes

python/cudf/cudf/core/single_column_frame.py Show resolved Hide resolved

merge conflict

a8ee783

Matt711 commented May 13, 2025

View reviewed changes

python/cudf/cudf/core/frame.py Show resolved Hide resolved

Matt711 mentioned this pull request May 13, 2025

Add cython bindings for cudf::table_to_device_array #18498

Closed

3 tasks

bdice requested changes May 13, 2025

View reviewed changes

Matt711 and others added 2 commits May 14, 2025 13:38

Merge branch 'branch-25.06' into fea/cudf/df-cupy-fast-path

30f613b

address review

366005d

Matt711 requested a review from bdice May 14, 2025 19:38

mroeschke reviewed May 14, 2025

View reviewed changes

python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved

mroeschke reviewed May 14, 2025

View reviewed changes

python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved

address review

03ec777

mroeschke reviewed May 14, 2025

View reviewed changes

python/pylibcudf/pylibcudf/tests/test_reshape.py Outdated Show resolved Hide resolved

mroeschke reviewed May 14, 2025

View reviewed changes

python/cudf/cudf/core/frame.py Show resolved Hide resolved

address review

e83c07f

bdice approved these changes May 14, 2025

View reviewed changes

python/cudf/benchmarks/common/config.py Outdated Show resolved Hide resolved

python/pylibcudf/pylibcudf/reshape.pyx Outdated Show resolved Hide resolved

Matt711 added 2 commits May 14, 2025 16:50

address review

614492f

docs

eee909e

mroeschke approved these changes May 14, 2025

View reviewed changes

Matt711 added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 14, 2025

Matt711 and others added 2 commits May 14, 2025 22:01

doc failure and benchmarks failures

3db48d8

Merge branch 'branch-25.06' into fea/cudf/df-cupy-fast-path

9ace63e

docs

5bca32d

Matt711 commented May 15, 2025

View reviewed changes

docs

ea5e0cd

rapids-bot bot merged commit 7694248 into rapidsai:branch-25.06 May 15, 2025
123 checks passed

github-project-automation bot moved this from In Progress to Done in cuDF Python May 15, 2025

vyasr mentioned this pull request May 15, 2025

[FEA] Accelerate cupy array creation from DataFrame.values #16483

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast paths for `DataFrame.to_cupy` #18801

Add fast paths for `DataFrame.to_cupy` #18801

Uh oh!

Matt711 commented May 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdice commented May 13, 2025

Uh oh!

Matt711 commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdice left a comment

Uh oh!

Uh oh!

Uh oh!

Matt711 commented May 15, 2025

Uh oh!

Matt711 May 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add fast paths for DataFrame.to_cupy #18801

Add fast paths for DataFrame.to_cupy #18801

Uh oh!

Conversation

Matt711 commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdice commented May 13, 2025

Uh oh!

Matt711 commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdice left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Matt711 commented May 15, 2025

Uh oh!

Matt711 May 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add fast paths for `DataFrame.to_cupy` #18801

Add fast paths for `DataFrame.to_cupy` #18801

Matt711 commented May 13, 2025 •

edited

Loading