We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug Low performance with cuDF Pandas and XGBoost using this dataset and notebook.
Performance is slower than CPU equivalents. Tested on Colab with L4 and local WSL with 4090.
Steps/Code to reproduce bug Open the notebook and run through the cells. Observe the slow performance compared to CPU Pandas and XGboost.
Expected behavior Performance is expected to be significantly faster than CPU with cuDF Pandas and XGBoost.
Environment overview (please complete the following information)
Additional context
Colab L4: Loading time - 46 seconds Preprocessing time - 476 seconds Training time - 240 seconds
Colab CPU: Loading time - 23 seconds Preprocessing time - 47 seconds Training time - 252 seconds
The text was updated successfully, but these errors were encountered:
It seems like the slowdowns are due to from_pandas spending lots of time in _has_any_nan.
from_pandas
_has_any_nan
cudf/python/cudf/cudf/core/column/column.py
Lines 1478 to 1483 in 4fe338c
It seems like maybe this is happening in the cells that call replace?
replace
# Apply the consolidation df['Company'] = df['Company'][df['Company'].isin(name_mapping.keys())].replace(name_mapping).astype('category')
takes ~75 seconds in DataFrame.__getitem__. I think this is related to the _has_any_nan call?
DataFrame.__getitem__
I am not able to dig any further on this at the moment but perhaps @galipremsagar or @mroeschke would have insight.
Sorry, something went wrong.
I found the bug, working on a fix.
Dataframe.__setitem__
galipremsagar
Successfully merging a pull request may close this issue.
Describe the bug
Low performance with cuDF Pandas and XGBoost using this dataset and notebook.
Performance is slower than CPU equivalents. Tested on Colab with L4 and local WSL with 4090.
Steps/Code to reproduce bug
Open the notebook and run through the cells. Observe the slow performance compared to CPU Pandas and XGboost.
Expected behavior
Performance is expected to be significantly faster than CPU with cuDF Pandas and XGBoost.
Environment overview (please complete the following information)
Colab L4 instance
Native RAPIDS installed on Colab
Additional context
Colab L4:
Loading time - 46 seconds
Preprocessing time - 476 seconds
Training time - 240 seconds
Colab CPU:
Loading time - 23 seconds
Preprocessing time - 47 seconds
Training time - 252 seconds
The text was updated successfully, but these errors were encountered: