Skip to content

wip: func kwargs #4743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

wip: func kwargs #4743

wants to merge 1 commit into from

Conversation

universalmind303
Copy link
Contributor

Changes Made

Related Issues

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

Comment on lines +641 to +676
def __call__(self, f: UserDefinedPyFuncLike) -> Callable[[UserDefinedPyFuncLike], Expression]:
def wrapper(*args, **kwargs):
print("args:", args) # (col("image"),)
print("kwargs:", kwargs) # {'degrees': 90}
bound_args = self._bind_args(f, *args, **kwargs)
def batch_func(*series_args: Series) -> list[Any]:
print('\n batch func args', series_args)
print('\n batch func kwargs', kwargs)
output = []
for values in series_args:
output.append(f(*values, **kwargs))
return output
expressions = list(bound_args.expressions().values())
inferred_return_dtype = self.get_return_type_for_func(f, args, kwargs)

# Grab a name for the UDF. It **should** be unique.
module_name = getattr(f, "__module__", "")
qual_name = getattr(f, "__qualname__")

if module_name:
name = f"{module_name}.{qual_name}"
else:
name = qual_name

return Expression.udf(
name=name,
inner=UninitializedUdf(batch_func),
bound_args=bound_args,
expressions=expressions,
return_dtype=inferred_return_dtype,
init_args=None,
resource_request=None,
batch_size=None,
concurrency=None,
)
return wrapper
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to get the kwargs working in here. I think i'm missing something really obvious, but if i do inner=UninitializedUdf(batch_func),, it passes in the kwargs but not the args

@daft.func(return_dtype=dt.image())
def rotate(image, degrees):
    return image


df.select(rotate(col("image"), degrees=90)).show()
batch func args ()
batch func kwargs {'degrees': 90}

but if i switch that to the other call style i see for UninitializedUdf
inner=UninitializedUdf(lambda: batch_func),

then i get everything dumped into the args instead

 batch func args (╭──────────────╮
│ image        │
│ ---          │
│ Image[MIXED] │
╞══════════════╡
│ <Image>      │
╰──────────────╯
, 90)

tagging @kevinzwang @colin-ho @jaychia as you all have the most context on this part of the codebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, there's not something obvious that I'm seeing, but let me know if you'd like me to dig some more.

The whole arg binding stuff that we currently do is honestly very jank, maybe we should just require that all arguments in a batch UDF be Series. If you want to have scalar arguments, you can write a higher order function or get the first value from a series.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, there's not something obvious that I'm seeing, but let me know if you'd like me to dig some more.

yeah that would be much appreciated! I spent a few hours debugging without much progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to have scalar arguments, you can write a higher order function or get the first value from a series.

Some of this might have to be discussed on a higher level in terms of API design and what we want to allow users to do. Pickling at some level feels unavoidable to me if we want to allow users to pass in arbitrary Python arguments.

  • higher order function gets jank, I'd rather us just pickle inputs.
  • get the first value from a series I'm guessing this will lead to some pretty confusing behavior from a user's POV in terms of ser/de back and forth from Python to Daft series types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right. I'll take a look at how to fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants