Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Aggregation serialization in pylibcudf #17469

Open
wants to merge 5 commits into
base: branch-25.02
Choose a base branch
from

Conversation

pentschev
Copy link
Member

Description

Support Aggregation serialization in pylibcudf. This is required to provide distributed support for cudf-polars.

Unfortunately, serialization of the Aggregation class requires access to implementation details in libcudf, so those needed to be exposed to Cython. Fortunately, all the attributes required are already public so there are no changes required in libcudf.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@pentschev pentschev added feature request New feature or request 3 - Ready for Review Ready for review by team non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package labels Nov 28, 2024
@pentschev pentschev requested a review from a team as a code owner November 28, 2024 21:14
@github-actions github-actions bot added the Python Affects Python cuDF API. label Nov 28, 2024
@pentschev pentschev changed the title Pylibcudf aggregation serialization Support Aggregation serialization in pylibcudf Nov 28, 2024
@pentschev pentschev changed the title Support Aggregation serialization in pylibcudf Support Aggregation serialization in pylibcudf Nov 28, 2024
@pentschev
Copy link
Member Author

pentschev commented Nov 29, 2024

The CI errors from custreamz look related to this PR but I don't think they are in fact. I can locally reproduce them only with upstream Dask/Distributed, but not with dask=2024.11.2 distributed=2024.11.2 dask-expr=1.1.19, it is also reproducible with branch-25.02 without changes from this PR.

cdef correlation_aggregation *correlation_cast
cdef covariance_aggregation *covariance_cast

if self.kind() is Kind.SUM:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this cascade with a singledispatch helper function? It's a lot of clauses to fail through in the worst case.

@@ -170,3 +174,66 @@ cdef extern from "cudf/aggregation.hpp" namespace "cudf" nogil:
null_policy null_handling,
null_order null_precedence,
rank_percentage percentage) except +libcudf_exception_handler

cdef extern from "cudf/detail/aggregation/aggregation.hpp" \
namespace "cudf::detail" nogil:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm we don't expose pretty much any cudf detail APIs to pylibcudf, and I don't want to start here. Can we open an issue about these? If these are attributes that are absolutely necessary to reconstruct the serialized types, then we should discuss exposing them publicly in libcudf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment Vyas, I've opened #17630. Please let me know if there's anything else I can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team feature request New feature or request non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants