-
Notifications
You must be signed in to change notification settings - Fork 42
[FSTORE-1411] On-Demand Transformation Functions #1371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kennethmhc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments about the user facing API.
| # Returns | ||
| `Callable`: Pandas UDF in the spark engine otherwise returns a python function for the UDF. | ||
| """ | ||
| if self.udf_type is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it needed?
|
|
||
|
|
||
| def udf(return_type: Union[List[type], type]) -> "HopsworksUdf": | ||
| class UDFType(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually transformation type. The UDF is actually the same and can be used for both ODT and MDT. I understand this type is needed for the udf to return different column names, but imo this is better to be done in transformation_function class and also the validation of the transformation type. But it is not critical, can be refactored later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I will move it out to transformation function class.
| request_parameters: Optional[ | ||
| Union[List[Dict[str, Any]], Dict[str, Any]] | ||
| ] = None, | ||
| external: Optional[bool] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it needed? this function won't fetch features from online store right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are right. There is no real need for a Vector server to be initlized. I think should probably refactor the code to move the functionality out of VectorServer
| def transform( | ||
| self, | ||
| feature_vector: Union[List[Any], List[List[Any]], pd.DataFrame, pl.DataFrame], | ||
| external: Optional[bool] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it needed? this function won't fetch features from online store right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right it won't fetch features from the online store, but MDT's might require statistics of features. The statistics used while transforming depends on the training_dataset_version set in init_serving so if init_serving had not been done we don't know the statistics to be used for applying the transformation function. So I basically copied the code done in get_feature_vector for initializing vector server.
I could be wrong here but I think there should be some check to see if init_serving or init_batch_scoring has been performed.
…d and adapting changes for save functions feature group
kennethmhc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (the rest of the comment will be addressed in a separate PR)
This PR adds functionality for On-demand transformation function as per the design log https://hopsworks.atlassian.net/wiki/spaces/FST/pages/461733903/On-Demand+Transformation+Functions+-+User+API+and+Implementation
Changes Done:
TransformationFunctionsAttachedfrom the python clientcompute_on_demand_featurestransform.JIRA Issue: https://hopsworks.atlassian.net/browse/FSTORE-1411
Priority for Review: -
Related PRs:
https://github.com/logicalclocks/hopsworks-ee/pull/1862
logicalclocks/hopsworks-chef#1148
https://github.com/logicalclocks/loadtest/pull/394
How Has This Been Tested?
Checklist For The Assigned Reviewer: