Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix type of empty Index and raise warning in Series constructor #14116

Merged
merged 5 commits into from
Sep 20, 2023

Conversation

galipremsagar
Copy link
Contributor

Description

Fixes: #14091
This PR fixes empty inputs dtype in Index to default to str instead of float64. Another change is there is a deprecation warning for Series constructor to match pandas.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 18, 2023
@galipremsagar galipremsagar self-assigned this Sep 18, 2023
@galipremsagar galipremsagar requested a review from a team as a code owner September 18, 2023 02:05
python/cudf/cudf/tests/test_dataframe.py Outdated Show resolved Hide resolved
@@ -2840,7 +2841,7 @@ def test_series_all_null(num_elements, null_type):
@pytest.mark.parametrize("num_elements", [0, 2, 10, 100])
def test_series_all_valid_nan(num_elements):
data = [np.nan] * num_elements
sr = cudf.Series(data, nan_as_null=False)
sr = _create_cudf_series(data, nan_as_null=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

For num_elements == 0, the null count should still be zero even if it interprets the dtype as string/object (so the test is still valid, though it might warn?).

For num_elements > 0, the data should be non-empty and thus be correctly interpreted as a floating type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For num_elements == 0, the null count should still be zero even if it interprets the dtype as string/object (so the test is still valid, though it might warn?).

Yup, this is the reason for this change.

@@ -4073,28 +4074,28 @@ def test_empty_dataframe_describe():


def test_as_column_types():
col = column.as_column(cudf.Series([]))
col = column.as_column(_create_cudf_series([]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here and below -- I'm not sure if I understand why these helper functions are needed. It seems like the test construction might need subtle changes but I don't think this casting helper function is the right answer. It's not announcing what it is actually doing -- the differences between cudf.Series and _create_cudf_series should be in the name, if this is needed (like _create_cudf_series_float64_default).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, reverted the change here and passed explicit dtypes. Renamed the helper to _create_cudf_series_float64_default and _create_pandas_series_float64_default. These will be removed in pandas-2.0 upgrade.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving -- but let's make sure to remove these testing helper functions early in the pandas 2.0 migration.

@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit f7ca051 into rapidsai:branch-23.10 Sep 20, 2023
54 checks passed
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs cuDF (Python) Reviewer labels Sep 20, 2023
@galipremsagar galipremsagar mentioned this pull request Sep 26, 2023
3 tasks
rapids-bot bot pushed a commit that referenced this pull request Sep 26, 2023
Calling `cudf.Index([])` results in `str` dtype `Index`. This PR fixes an issue with a pytorch related pytest by explicitly passing a `float64` dtype.

xref: #14116

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - https://github.com/brandon-b-miller

URL: #14198
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[BUG] Default type of empty list for Index should be object dtype
2 participants