Skip to content

Conversation

yhuang-db
Copy link
Contributor

@yhuang-db yhuang-db commented Oct 6, 2025

What changes were proposed in this pull request?

This PR proposes to add doCanonicalize function for DataSourceV2ScanRelation. The implementation is similar to the one implemented in BatchScanExec, because they are both the leafNodes of DSv2 logicalPlan and physicalPlan, respectively.

Why are the changes needed?

Query optimization rules such as MergeScalarSubqueries check if two plans are identical by comparing their canonicalized form. For DSv2, for physical plan, the canonicalization goes down in the child hierarchy to the BatchScanExec, which has a doCanonicalize function; for logical plan, the canonicalization goes down to the DataSourceV2ScanRelation, which, however, does not have a doCanonicalize function. As a result, two logical plans who are semantically identical are not identified.

Does this PR introduce any user-facing change?

No

How was this patch tested?

todo

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Oct 6, 2025
@peter-toth
Copy link
Contributor

@yhuang-db , can you please check the test failures? Some of them seems related to your change.

@yhuang-db yhuang-db changed the title [SPARK-53809][SQL]Add canonicalization for DSv2 scan [SPARK-53809][SQL]Add canonicalization for DataSourceV2ScanRelation Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants