[FSTORE-1186] Polars Integration into Hopsworks - Writing and Reading DataFrames #1221
[FSTORE-1186] Polars Integration into Hopsworks - Writing and Reading DataFrames #1221davitbzh merged 28 commits intologicalclocks:masterfrom
Conversation
There was a problem hiding this comment.
don't you need any mention of polars in feature_store.py? also you need to provide tutorial for polars in https://github.com/logicalclocks/hopsworks-tutorials
| primary_keys=False, | ||
| event_time=False, | ||
| inference_helper_columns=False, | ||
| dataframe_type: Optional[str] = "default", |
There was a problem hiding this comment.
This is user interfacing function, is it possible to avoid dataframe_type = "default" or explain clearly what "default" means
python/hsfs/feature_view.py
Outdated
| for extra information. If inference helper columns were not defined in the feature view | ||
| `inference_helper_columns=True` will not any effect. Defaults to `False`, no helper columns. | ||
| dataframe_type: str, optional. Possible values are `"default"`, `"spark"`, | ||
| `"pandas"`, "polars"`, `"numpy"` or `"python"`, defaults to `"default"`. |
There was a problem hiding this comment.
see above, not sure what "default" means
python/hsfs/feature_view.py
Outdated
| then`training_helper_columns=True` will not have any effect. Defaults to `False`, no training helper | ||
| columns. | ||
| dataframe_type: str, optional. Possible values are `"default"`, `"spark"`, | ||
| `"pandas"`, "polars"`, `"numpy"` or `"python"`, defaults to `"default"`. |
| primary_keys=False, | ||
| event_time=False, | ||
| training_helper_columns=False, | ||
| dataframe_type: Optional[str] = "default", |
python/hsfs/feature_view.py
Outdated
| then`training_helper_columns=True` will not have any effect. Defaults to `False`, no training helper | ||
| columns. | ||
| dataframe_type: str, optional. Possible values are `"default"`, `"spark"`, | ||
| `"pandas"`, "polars"`, `"numpy"` or `"python"`, defaults to `"default"`. |
…created during materialization and also reading from online feature store
…type is not spark and dataframe obatined is spark in _return_dataframe_type
…hanges in spark engine
…make sure type check works over different polars versions
python/hsfs/engine/python.py
Outdated
| dataframe, pl.dataframe.frame.DataFrame | ||
| ): | ||
| warnings.warn( | ||
| "Great Expectations does not support Polars dataframes directly using Great Expectations with Polars datarames can be slow." |
There was a problem hiding this comment.
| "Great Expectations does not support Polars dataframes directly using Great Expectations with Polars datarames can be slow." | |
| "Currently Great Expectations does not support Polars dataframes. This operation will convert to Pandas dataframe that can be slow." |
|
|
||
| # Returns | ||
| `pd.DataFrame` or `List[dict]`. Defaults to `pd.DataFrame`. | ||
| `pd.DataFrame`, `polars.DataFrame` or `List[dict]`. Defaults to `pd.DataFrame`. |
There was a problem hiding this comment.
It says Defaults topd.DataFrame. but function says dataframe_type: Optional[str] = "default"`. as above set it correctly and explain what default is
There was a problem hiding this comment.
in get_inference_helpers function the default value set for return_type earlier was "pandas". That is why it is mentioned as pd.Dataframe. I corrected the return_type comment as Defaults to "pandas".
| for JDBC or database based connectors such as Snowflake, JDBC or Redshift. | ||
| dataframe_type: str, optional. Possible values are `"default"`, `"spark"`, | ||
| `"pandas"`, "polars"`, `"numpy"` or `"python"`, defaults to `"default"`. | ||
|
|
There was a problem hiding this comment.
add explanation of "default" here as well
| path: Path within the bucket to be read. | ||
| dataframe_type: str, optional. Possible values are `"default"`, `"spark"`, | ||
| `"pandas"`, "polars"`, `"numpy"` or `"python"`, defaults to `"default"`. | ||
|
|
There was a problem hiding this comment.
add explanation of "default" here as well
| path: Not relevant for JDBC based connectors such as Redshift. | ||
| dataframe_type: str, optional. Possible values are `"default"`, `"spark"`, | ||
| `"pandas"`, `"numpy"` or `"python"`, defaults to `"default"`. | ||
|
|
There was a problem hiding this comment.
add explanation of "default" here as well
| If no path is specified default container path will be used from connector. | ||
| dataframe_type: str, optional. Possible values are `"default"`, `"spark"`, | ||
| `"pandas"`, `"numpy"` or `"python"`, defaults to `"default"`. | ||
|
|
There was a problem hiding this comment.
add explanation of "default" here as well
| path: Not relevant for Snowflake connectors. | ||
| dataframe_type: str, optional. Possible values are `"default"`, `"spark"`, | ||
| `"pandas"`, `"numpy"` or `"python"`, defaults to `"default"`. | ||
|
|
There was a problem hiding this comment.
add explanation of "default" here as well
| path: Not relevant for JDBC based connectors. | ||
| dataframe_type: str, optional. Possible values are `"default"`, `"spark"`, | ||
| `"pandas"`, `"numpy"` or `"python"`, defaults to `"default"`. | ||
|
|
There was a problem hiding this comment.
add explanation of "default" here as well
python/hsfs/storage_connector.py
Outdated
| options: Spark options. Defaults to `None`. | ||
| path: GCS path. Defaults to `None`. | ||
| dataframe_type: str, optional. Possible values are `"default"`, `"spark"`, | ||
| `"pandas", `"numpy"` or `"python"`, defaults to `"default"`. |
There was a problem hiding this comment.
add explanation of "default" here as well
python/hsfs/storage_connector.py
Outdated
| options: Spark options. Defaults to `None`. | ||
| path: BigQuery table path. Defaults to `None`. | ||
| dataframe_type: str, optional. Possible values are `"default"`, `"spark"`, | ||
| `"pandas"`, `"numpy"` or `"python"`, defaults to `"default"`. |
There was a problem hiding this comment.
add explanation of "default" here as well
This PR adds support for Polars, specifically writing and reading polars dataframes to a feature store.
Changes:
LoadTest - https://github.com/logicalclocks/loadtest/pull/286
JIRA Issue: https://hopsworks.atlassian.net/browse/FSTORE-1186
How Has This Been Tested?
Checklist For The Assigned Reviewer: