-
Notifications
You must be signed in to change notification settings - Fork 662
Open
Labels
bugIncorrect behavior inside of ibisIncorrect behavior inside of ibis
Description
What happened?
I'm using ibis-framework
with pyspark
backend to apply a long sequence of data transformation through a pipeline, that rely on writing and reading files to/from disk. I noted two different types of weird messages from Ibis that are not directly related to my dataframe operations:
- Type 1: Deprecated type definition in UDFs (even when not using Ibis UDFs)
...python3.10/site-packages/pyspark/sql/pandas/functions.py:407: UserWarning: In Python 3.6+ and Spark 3.0+, it is preferred to specify type hints for pandas UDF instead of specifying pandas UDF type which will be deprecated in the future releases. See SPARK-28264 for more details.
- Type 2: Repeated function registration, that happens outside Python logging (the messages are simply written to the console, like a
print()
and don't respect my currentlogging
configuration - it probably comes from Spark itself):25/07/23 19:40:45 WARN SimpleFunctionRegistry: The function unwrap_json_str replaced a previously registered function. 25/07/23 19:40:45 WARN SimpleFunctionRegistry: The function unwrap_json_int replaced a previously registered function. 25/07/23 19:40:45 WARN SimpleFunctionRegistry: The function unwrap_json_bool replaced a previously registered function. 25/07/23 19:40:45 WARN SimpleFunctionRegistry: The function unwrap_json_float replaced a previously registered function.
It's a simple issue, more on the annoying side than a functional problem. The main problem is how dirty those messages leave the console output, for applications that rely heavily on ibis transformations.
Examples:
-
In a notebook
-
When using ibis to run a chained application (through N different nodes/steps), where many nodes apply ibis transformations and read/write files
Two questions:
- About the first type:
- is the usage of Pandas UDF type by design?
- can it be fixed, considering that it will be deprecated/removed?
- About the second type:
- is it possible to register the unwrap operations only when not registered yet? No problems about this message showing once.
Thank you for this amazing project :)
What version of ibis are you using?
10.6.0
What backend(s) are you using, if any?
PySpark
Relevant log output
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
bugIncorrect behavior inside of ibisIncorrect behavior inside of ibis
Type
Projects
Status
backlog