[SPARK-53882][CONNECT][DOCS] Add documentation comparing behavioral differences between Spark Connect and Spark Classic #52585
+328
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Spark Connect is a decoupled client-server architecture that allows remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol, which is well documented in https://spark.apache.org/docs/latest/spark-connect-overview.html.
However, there is a lack of guidance to help users understand the behavioral differences between Spark Classic and Spark Connect and to avoid unexpected behavior.
In this PR, a document is added that details the behavioral differences between Spark Connect and Spark Classic, lazy schema analysis and name resolution, and their implications.
Why are the changes needed?
This doc helps users migrating from Spark Classic to Spark Connect to understand the behavioral differences.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
N/A.
Was this patch authored or co-authored using generative AI tooling?
No.