-
Notifications
You must be signed in to change notification settings - Fork 236
wip: func kwargs #4743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
wip: func kwargs #4743
Conversation
def __call__(self, f: UserDefinedPyFuncLike) -> Callable[[UserDefinedPyFuncLike], Expression]: | ||
def wrapper(*args, **kwargs): | ||
print("args:", args) # (col("image"),) | ||
print("kwargs:", kwargs) # {'degrees': 90} | ||
bound_args = self._bind_args(f, *args, **kwargs) | ||
def batch_func(*series_args: Series) -> list[Any]: | ||
print('\n batch func args', series_args) | ||
print('\n batch func kwargs', kwargs) | ||
output = [] | ||
for values in series_args: | ||
output.append(f(*values, **kwargs)) | ||
return output | ||
expressions = list(bound_args.expressions().values()) | ||
inferred_return_dtype = self.get_return_type_for_func(f, args, kwargs) | ||
|
||
# Grab a name for the UDF. It **should** be unique. | ||
module_name = getattr(f, "__module__", "") | ||
qual_name = getattr(f, "__qualname__") | ||
|
||
if module_name: | ||
name = f"{module_name}.{qual_name}" | ||
else: | ||
name = qual_name | ||
|
||
return Expression.udf( | ||
name=name, | ||
inner=UninitializedUdf(batch_func), | ||
bound_args=bound_args, | ||
expressions=expressions, | ||
return_dtype=inferred_return_dtype, | ||
init_args=None, | ||
resource_request=None, | ||
batch_size=None, | ||
concurrency=None, | ||
) | ||
return wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't seem to get the kwargs working in here. I think i'm missing something really obvious, but if i do inner=UninitializedUdf(batch_func),
, it passes in the kwargs but not the args
@daft.func(return_dtype=dt.image())
def rotate(image, degrees):
return image
df.select(rotate(col("image"), degrees=90)).show()
batch func args ()
batch func kwargs {'degrees': 90}
but if i switch that to the other call style i see for UninitializedUdf
inner=UninitializedUdf(lambda: batch_func),
then i get everything dumped into the args instead
batch func args (╭──────────────╮
│ image │
│ --- │
│ Image[MIXED] │
╞══════════════╡
│ <Image> │
╰──────────────╯
, 90)
tagging @kevinzwang @colin-ho @jaychia as you all have the most context on this part of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, there's not something obvious that I'm seeing, but let me know if you'd like me to dig some more.
The whole arg binding stuff that we currently do is honestly very jank, maybe we should just require that all arguments in a batch UDF be Series. If you want to have scalar arguments, you can write a higher order function or get the first value from a series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, there's not something obvious that I'm seeing, but let me know if you'd like me to dig some more.
yeah that would be much appreciated! I spent a few hours debugging without much progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to have scalar arguments, you can write a higher order function or get the first value from a series.
Some of this might have to be discussed on a higher level in terms of API design and what we want to allow users to do. Pickling at some level feels unavoidable to me if we want to allow users to pass in arbitrary Python arguments.
higher order function
gets jank, I'd rather us just pickle inputs.get the first value from a series
I'm guessing this will lead to some pretty confusing behavior from a user's POV in terms of ser/de back and forth from Python to Daft series types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right. I'll take a look at how to fix this
Changes Made
Related Issues
Checklist
docs/mkdocs.yml
navigation