[SPARK-51206][PYTHON][CONNECT] Move Arrow conversion helpers out of Spark Connect #49941
+564
−526
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Refactor
pyspark.sql.connect.conversion
to moveLocalDataToArrowConversion
andArrowTableToRowsConversion
intopyspark.sql.conversion
.The reason is that
pyspark.sql.connect.conversion
checks for Spark Connect dependencies such asgrpcio
andpandas
, butLocalDataToArrowConversion
andArrowTableToRowsConversion
don't need these dependencies.pyspark.sql.connect.conversion
still re-exports the two classes for backward compatibility.Why are the changes needed?
Python Data Sources should work without Spark Connect dependencies but currently it imports
LocalDataToArrowConversion
andArrowTableToRowsConversion
frompyspark.sql.connect.conversion
making it require unnecessary dependencies. This change moves these two classes topyspark.sql.conversion
so that Python Data Sources runs without Spark Connect dependencies.Does this PR introduce any user-facing change?
Relaxed requirements for using Python Data Sources.
How was this patch tested?
Existing tests should make sure that the changes don't break anything.
Manually tested to ensure that Python Data Sources can run without grpcio and pandas.
Was this patch authored or co-authored using generative AI tooling?
No