-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent pylibcudf
serialization in cudf-polars
#17449
base: branch-25.02
Are you sure you want to change the base?
Prevent pylibcudf
serialization in cudf-polars
#17449
Conversation
class Operator(IntEnum): | ||
"""Internal and picklable representation of pylibcudf's `BinaryOperator`.""" | ||
|
||
ADD = auto() | ||
ATAN2 = auto() | ||
BITWISE_AND = auto() | ||
BITWISE_OR = auto() | ||
BITWISE_XOR = auto() | ||
DIV = auto() | ||
EQUAL = auto() | ||
FLOOR_DIV = auto() | ||
GENERIC_BINARY = auto() | ||
GREATER = auto() | ||
GREATER_EQUAL = auto() | ||
INT_POW = auto() | ||
INVALID_BINARY = auto() | ||
LESS = auto() | ||
LESS_EQUAL = auto() | ||
LOGICAL_AND = auto() | ||
LOGICAL_OR = auto() | ||
LOG_BASE = auto() | ||
MOD = auto() | ||
MUL = auto() | ||
NOT_EQUAL = auto() | ||
NULL_EQUALS = auto() | ||
NULL_LOGICAL_AND = auto() | ||
NULL_LOGICAL_OR = auto() | ||
NULL_MAX = auto() | ||
NULL_MIN = auto() | ||
NULL_NOT_EQUALS = auto() | ||
PMOD = auto() | ||
POW = auto() | ||
PYMOD = auto() | ||
SHIFT_LEFT = auto() | ||
SHIFT_RIGHT = auto() | ||
SHIFT_RIGHT_UNSIGNED = auto() | ||
SUB = auto() | ||
TRUE_DIV = auto() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What?
import pylibcudf as plc
import pickle
assert pickle.loads(pickle.dumps(plc.binaryop.BinaryOperator.ADD)) == plc.binaryop.BinaryOperator.ADD
Works fine.
self._regex_program = plc.strings.regex_program.RegexProgram.create( | ||
plc.strings.regex_program.RegexProgram.create( | ||
pattern, | ||
flags=plc.strings.regex_flags.RegexFlags.DEFAULT, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of hate this, because we have to do this twice now, once for control-flow here, and then once to actually use the program.
EDIT: Commented in wrong PR, this was meant for #17469 . |
…ibcudf-serialization
This reverts commit 3e14ec9.
… into prevent-pylibcudf-serialization
…ibcudf-serialization
…ibcudf-serialization
@@ -947,6 +944,7 @@ def do_evaluate( | |||
# TODO: uniquify | |||
requests = [] | |||
replacements: list[expr.Expr] = [] | |||
agg_infos = [req.collect_agg(depth=0) for req in agg_requests] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a good way to avoid AggInfo
serialization. The disadvantage is that we don't raise an error from collect_agg
when the GroupBy
object is created at translation time.
options, | ||
self.agg_infos, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also found that options
can be an unserializable Polars objects (though the only thing we actually use in do_evaluate
is options.slice
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, do you see any tests we could rely immediately upon for this case?
f"Unreachable, supported agg {name=} has no implementation" | ||
) # pragma: no cover | ||
self.op = op | ||
self.request = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to somehow validate that the name
is supported within __init__
so that we catch a problem at translation time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should already happen in https://github.com/rapidsai/cudf/pull/17449/files/54a9cd6b8199bc3c0b89dcfaa2bb41e87c48547e#diff-38ad8c29ff55c4194a29a45f2a003e8219f7064d0ba9d552f49a866009eaa920L45-L48, or am I missing something else?
@@ -124,6 +77,46 @@ def __init__( | |||
"linear": plc.types.Interpolation.LINEAR, | |||
} | |||
|
|||
def _fill_request(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can just define request
as a property (or cached property), and move this same logic there?
@property
def request(self):
...
``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done in faf42d0 .
After discussing with @wence- , he suggested the appropriate way to deal with
|
self.dynamic = polars_groupby_options.dynamic | ||
self.rolling = polars_groupby_options.rolling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually these will also need translated, but for now we can dodge it because they are always None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I couldn't find any examples in our tests to aid that, so for now I left it as above to make things simpler.
Description
Reorganize cudf-polars implementation to prevent the need to serialize
pylibcudf
objects.Checklist