You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi guys, I meet a problem during using class UDF about the init args, let's assume I have a Operator which accept some init args, the popular usage might be as follow:
@daft.udf(return_dtype=daft.DataType.string())
class MyOperator(Operator):
def __init__(self, text: str) -> None:
self.text = text
def __call__(self, data):
return [x + self.text for x in data]
MyOperator_CustomInitArgs = MyOperator.with_init_args(text="test")
df = daft.from_pydict({"foo": ["hello", "hello", "hello"]})
df = df.with_column("bar_custom", MyUdfWithInit_CustomInitArgs(df["foo"]))
It's a little bit inconvenient for me that I have to set the init args at first to reset the UDF and then use it, so I try to create a wrapper function to handle it like llm_generate, the detail as follow:
Option A:
class MyOperator(Operator):
def __init__(self, text: str) -> None:
self.text = text
def __call__(self, data):
return [x + self.text for x in data]
@classmethod
def __return_column_type__(cls):
return daft.DataType.string()
def my_udf(
operator: type[Operator],
construct_args: dict[str, Any] | None = None,
num_cpus: float | None = None,
num_gpus: float | None = None,
memory_bytes: int | None = None,
batch_size: int | None = None,
concurrency: int | None = None,
) -> UDF:
class Wrapper:
def __init__(self) -> None:
init_args = construct_args or {}
# add some kv to init_args
self.delegate = operator(**init_args)
def __call__(self, *args: Any, **kwargs: Any):
return self.delegate(args, kwargs)
return daft.udf(
return_dtype=operator.__return_column_type__(),
num_cpus=num_cpus,
num_gpus=num_gpus,
memory_bytes=memory_bytes,
batch_size=batch_size,
concurrency=concurrency,
)(Wrapper)
df = daft.from_pydict({"foo": ["hello", "hello", "hello"]})
df = df.with_column("bar_custom", my_udf(MyOperator, construct_args={text:"test"})(col["foo"]))
The problem puzzling me is that is there any performance difference between these two options.
The option A using a Wrapper class which hold a external variable as a closure, and the wrapper class will be treat as a UserDefinedPyFuncLike and wrapper again and then pass to rust via Pyo3 as a RuntimePyObject.
Sine option A using a closure, some problems might occur, e.g. memory leak, serialization, concurrency problem, I'm not sure if it will occur in daft case, or the Option B is a better solution?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi guys, I meet a problem during using class UDF about the init args, let's assume I have a
Operatorwhich accept some init args, the popular usage might be as follow:It's a little bit inconvenient for me that I have to set the init args at first to reset the
UDFand then use it, so I try to create a wrapper function to handle it likellm_generate, the detail as follow:Option A:
or
Option B:
Simillar to
llm_generateThe problem puzzling me is that is there any performance difference between these two options.
The option A using a Wrapper class which hold a external variable as a closure, and the wrapper class will be treat as a
UserDefinedPyFuncLikeand wrapper again and then pass to rust via Pyo3 as aRuntimePyObject.Sine option A using a closure, some problems might occur, e.g. memory leak, serialization, concurrency problem, I'm not sure if it will occur in daft case, or the Option B is a better solution?
Beta Was this translation helpful? Give feedback.
All reactions