Skip to content

Annoying messages when using pyspark backend #11485

@filipeo2-mck

Description

@filipeo2-mck

What happened?

I'm using ibis-framework with pyspark backend to apply a long sequence of data transformation through a pipeline, that rely on writing and reading files to/from disk. I noted two different types of weird messages from Ibis that are not directly related to my dataframe operations:

  • Type 1: Deprecated type definition in UDFs (even when not using Ibis UDFs)
    ...python3.10/site-packages/pyspark/sql/pandas/functions.py:407: UserWarning: In Python 3.6+ and Spark 3.0+, it is preferred to specify type hints for pandas UDF instead of specifying pandas UDF type which will be deprecated in the future releases. See SPARK-28264 for more details.
  • Type 2: Repeated function registration, that happens outside Python logging (the messages are simply written to the console, like a print() and don't respect my current logging configuration - it probably comes from Spark itself):
    25/07/23 19:40:45 WARN SimpleFunctionRegistry: The function unwrap_json_str replaced a previously registered function.
    25/07/23 19:40:45 WARN SimpleFunctionRegistry: The function unwrap_json_int replaced a previously registered function.
    25/07/23 19:40:45 WARN SimpleFunctionRegistry: The function unwrap_json_bool replaced a previously registered function.
    25/07/23 19:40:45 WARN SimpleFunctionRegistry: The function unwrap_json_float replaced a previously registered function.

It's a simple issue, more on the annoying side than a functional problem. The main problem is how dirty those messages leave the console output, for applications that rely heavily on ibis transformations.
Examples:

  • In a notebook

    Notebook output Image
  • When using ibis to run a chained application (through N different nodes/steps), where many nodes apply ibis transformations and read/write files

    Application output Image

Two questions:

  • About the first type:
    • is the usage of Pandas UDF type by design?
    • can it be fixed, considering that it will be deprecated/removed?
  • About the second type:
    • is it possible to register the unwrap operations only when not registered yet? No problems about this message showing once.

Thank you for this amazing project :)

What version of ibis are you using?

10.6.0

What backend(s) are you using, if any?

PySpark

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions