[BUG] cudf pivot extremely slow compared to pandas #17515
Labels
bug
Something isn't working
Needs Triage
Need team to review and classify
Python
Affects Python cuDF API.
Describe the bug
With a semi-large dataframe, the
pivot
method is super slow, especially compared to running it in pandas.Steps/Code to reproduce bug
The structure of the table looks as follows
Executing the following code takes ~ 3 min 45 seconds on a single A6000
df_cudf_pivot = df.pivot(index='user', columns='movie', values='rating').fillna(-1)
Doing the same in pandas -- ~15 seconds.
df_cudf_pivot = df.to_pandas().pivot(index='user', columns='movie', values='rating').fillna(-1)
Expected behavior
I would expect that the cudf version to be significantly faster than pandas.
Environment overview (please complete the following information)
docker pull
&docker run
commands usedEnvironment details
Click here to see environment details
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: