New UDFs Roadmap #4820
Replies: 8 comments 8 replies
-
|
@kevinzwang I wonder if this would be better as a discussion? Colin did something similar for the flotilla roadmap, but made it a discussion instead. |
Beta Was this translation helpful? Give feedback.
-
What is the advantage of making it a discussion? I forgot who said this but I recall that we wanted to use issues for asks that would need changes in Daft, and discussions otherwise. |
Beta Was this translation helpful? Give feedback.
-
|
Some Q/A on scalar function behavior: Q: What happens if the function closes on outside state?
Q: How are Series handled?
Q: Do we support optional/defaults/variadics/kwargs?
Q: What if I don't give a type hint e.g.
Q: Can I call it like
Q: Can I create a scalar class-based UDF?
Q: How is a
|
Beta Was this translation helpful? Give feedback.
-
|
Hi @kevinzwang @rchowell, one small question about how do we handle the data of logical datatype in Scalar UDF, e.g. Image, Tensor. will we create Image, Tensor python object for each logical just like |
Beta Was this translation helpful? Give feedback.
-
|
@kevinzwang , So I'm thinking: can udfactors apply for scheduling across nodes on Ray instead of being limited to local nodes? Of course, local node invocation should be prioritized. When it's found that some udfactors have no data, the nodes for scheduling udfactors with the concurrency count can be dynamically adjusted. I'm not sure if I've clearly expressed the problem here. |
Beta Was this translation helpful? Give feedback.
-
|
@kevinzwang Another issue is that I don't seem to find the design of UDAF in the current design plan of UDF. Could you please describe how to handle the N->1 scenario here? |
Beta Was this translation helpful? Give feedback.
-
|
@kevinzwang Hi, When can the |
Beta Was this translation helpful? Give feedback.
-
@kevinzwang hi,
|
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
As Daft becomes used for more and more multimodal/AI workflows, we see some increasing patterns around the usage of UDFs, and we'd like to redesign our UDFs to better work with these patterns. This issue tracks the progress on this redesign.
The major differences between the designs of the existing (legacy) and new UDFs:
concurrencyparameter. The new UDFs will not be stateful. Instead, to do stateful things, we will introduce the concept of "resources". Resources are not covered in this roadmap but will be a separate issue.In addition, the scope of this work also includes some new ways to use UDFs, such as multi-column outputs, generator UDFs, async UDFs, and ergonomics around conversions between Python and Daft types.
Examples
Simple scalar UDF
Generator UDF
Async UDF
Batch UDF
Type checking
Roadmap
These tasks do not necessarily need to be done in order
Beta Was this translation helpful? Give feedback.
All reactions