Prevent pylibcudf serialization in cudf-polars#17449
Prevent pylibcudf serialization in cudf-polars#17449pentschev wants to merge 14 commits intorapidsai:branch-25.04from
pylibcudf serialization in cudf-polars#17449Conversation
| class Operator(IntEnum): | ||
| """Internal and picklable representation of pylibcudf's `BinaryOperator`.""" | ||
|
|
||
| ADD = auto() | ||
| ATAN2 = auto() | ||
| BITWISE_AND = auto() | ||
| BITWISE_OR = auto() | ||
| BITWISE_XOR = auto() | ||
| DIV = auto() | ||
| EQUAL = auto() | ||
| FLOOR_DIV = auto() | ||
| GENERIC_BINARY = auto() | ||
| GREATER = auto() | ||
| GREATER_EQUAL = auto() | ||
| INT_POW = auto() | ||
| INVALID_BINARY = auto() | ||
| LESS = auto() | ||
| LESS_EQUAL = auto() | ||
| LOGICAL_AND = auto() | ||
| LOGICAL_OR = auto() | ||
| LOG_BASE = auto() | ||
| MOD = auto() | ||
| MUL = auto() | ||
| NOT_EQUAL = auto() | ||
| NULL_EQUALS = auto() | ||
| NULL_LOGICAL_AND = auto() | ||
| NULL_LOGICAL_OR = auto() | ||
| NULL_MAX = auto() | ||
| NULL_MIN = auto() | ||
| NULL_NOT_EQUALS = auto() | ||
| PMOD = auto() | ||
| POW = auto() | ||
| PYMOD = auto() | ||
| SHIFT_LEFT = auto() | ||
| SHIFT_RIGHT = auto() | ||
| SHIFT_RIGHT_UNSIGNED = auto() | ||
| SUB = auto() | ||
| TRUE_DIV = auto() |
There was a problem hiding this comment.
What?
import pylibcudf as plc
import pickle
assert pickle.loads(pickle.dumps(plc.binaryop.BinaryOperator.ADD)) == plc.binaryop.BinaryOperator.ADD
Works fine.
| try: | ||
| self._regex_program = plc.strings.regex_program.RegexProgram.create( | ||
| plc.strings.regex_program.RegexProgram.create( | ||
| pattern, | ||
| flags=plc.strings.regex_flags.RegexFlags.DEFAULT, | ||
| ) |
There was a problem hiding this comment.
I kind of hate this, because we have to do this twice now, once for control-flow here, and then once to actually use the program.
|
EDIT: Commented in wrong PR, this was meant for #17469 . |
…ibcudf-serialization
This reverts commit 3e14ec9.
… into prevent-pylibcudf-serialization
…ibcudf-serialization
…ibcudf-serialization
| # TODO: uniquify | ||
| requests = [] | ||
| replacements: list[expr.Expr] = [] | ||
| agg_infos = [req.collect_agg(depth=0) for req in agg_requests] |
There was a problem hiding this comment.
This is definitely a good way to avoid AggInfo serialization. The disadvantage is that we don't raise an error from collect_agg when the GroupBy object is created at translation time.
| options, | ||
| self.agg_infos, |
There was a problem hiding this comment.
I also found that options can be an unserializable Polars objects (though the only thing we actually use in do_evaluate is options.slice).
There was a problem hiding this comment.
Right, do you see any tests we could rely immediately upon for this case?
| f"Unreachable, supported agg {name=} has no implementation" | ||
| ) # pragma: no cover | ||
| self.op = op | ||
| self.request = None |
There was a problem hiding this comment.
I think we need to somehow validate that the name is supported within __init__ so that we catch a problem at translation time.
There was a problem hiding this comment.
This should already happen in https://github.com/rapidsai/cudf/pull/17449/files/54a9cd6b8199bc3c0b89dcfaa2bb41e87c48547e#diff-38ad8c29ff55c4194a29a45f2a003e8219f7064d0ba9d552f49a866009eaa920L45-L48, or am I missing something else?
| "linear": plc.types.Interpolation.LINEAR, | ||
| } | ||
|
|
||
| def _fill_request(self): |
There was a problem hiding this comment.
Maybe we can just define request as a property (or cached property), and move this same logic there?
@property
def request(self):
...
``|
After discussing with @wence- , he suggested the appropriate way to deal with |
| self.dynamic = polars_groupby_options.dynamic | ||
| self.rolling = polars_groupby_options.rolling |
There was a problem hiding this comment.
Eventually these will also need translated, but for now we can dodge it because they are always None.
There was a problem hiding this comment.
Indeed, I couldn't find any examples in our tests to aid that, so for now I left it as above to make things simpler.
|
@pentschev - I believe #17998 will supersede this PR. Let me know if I'm missing something, or if this can be closed. |
|
@rjzamora agreed. Closing. |
Description
Reorganize cudf-polars implementation to prevent the need to serialize
pylibcudfobjects.Checklist