-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Serialization of libcudf classes and exposing implementation details #17630
Comments
@pentschev - Is the challenge specific to to Perhaps we just need to serialize the "name" of these requests when we serialize an |
Yes.
No, that will not suffice, besides their names they have different underlying attributes that need to be serialized as well, see here the attributes I had to serialize for each different type. |
I think we may be talking about different things. I'm saying that a new cudf-polars |
Thanks @rjzamora it turns you are (most likely) right. I was able to do that now and the aggregation tests pass:
However, why it did work now is unclear to me, I was already attempting to follow the same path in this commit from November, either I overlooked some detail or something else changed in cudf-polars that ultimately helped with this after merging latest changes and doing a few more changes. With that said, I think it's possible we indeed will not need to serialize aggregations, at least not the ones that are currently supported. I also have to check all other tests to see if there isn't any aggregation operation in tests other than @nirandaperera FYI |
Is your feature request related to a problem? Please describe.
For multi-gpu polars, we will require serializing certain data in Python to be passed between Dask workers, for example
aggregation
s. In #17469 I've proposed a way to do that, however, that proposal requires certain implementation details fromaggreation.hpp
,more specifically classes derived from
aggregation
, such asstd_var_aggregation
. @vyasr has pointed out to the fact that those details are not exposed to pylibcudf and would be best if it continues like that.Describe the solution you'd like
The solution proposed in #17469 seems to be the lowest hanging fruit, but as described above may not be considered optimal for several reasons.
Describe alternatives you've considered
Exposing attributes of the classes publicly may be an alternative, but that would incur in a different set of potential issues.
I'm not familiar with most of the design and options available in libcudf, so it's likely core developers will see other potentially better alternatives.
The text was updated successfully, but these errors were encountered: