Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51206][PYTHON][CONNECT] Move Arrow conversion helpers out of Spark Connect #49941

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wengh
Copy link
Contributor

@wengh wengh commented Feb 13, 2025

What changes were proposed in this pull request?

Refactor pyspark.sql.connect.conversion to move LocalDataToArrowConversion and ArrowTableToRowsConversion into pyspark.sql.conversion.

The reason is that pyspark.sql.connect.conversion checks for Spark Connect dependencies such as grpcio and pandas, but LocalDataToArrowConversion and ArrowTableToRowsConversion don't need these dependencies.

pyspark.sql.connect.conversion still re-exports the two classes for backward compatibility.

Why are the changes needed?

Python Data Sources should work without Spark Connect dependencies but currently it imports LocalDataToArrowConversion and ArrowTableToRowsConversion from pyspark.sql.connect.conversion making it require unnecessary dependencies. This change moves these two classes to pyspark.sql.conversion so that Python Data Sources runs without Spark Connect dependencies.

Does this PR introduce any user-facing change?

Relaxed requirements for using Python Data Sources.

How was this patch tested?

Existing tests should make sure that the changes don't break anything.

Manually tested to ensure that Python Data Sources can run without grpcio and pandas.

Was this patch authored or co-authored using generative AI tooling?

No

python/pyspark/sql/conversion.py Outdated Show resolved Hide resolved
python/pyspark/sql/conversion.py Show resolved Hide resolved
python/pyspark/sql/conversion.py Outdated Show resolved Hide resolved
@allisonwang-db
Copy link
Contributor

cc @HyukjinKwon

@wengh wengh requested a review from allisonwang-db February 14, 2025 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants