Skip to content

[SPARK-52649][SQL] Trim aliases before matching Sort/Having/Filter expressions in buildAggExprList #51339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mihailotim-db
Copy link
Contributor

@mihailotim-db mihailotim-db commented Jul 1, 2025

What changes were proposed in this pull request?

Trim aliases before matching Sort/Having/Filter expressions with semantically equal expression from the Aggregate below in buildAggExprList

Why are the changes needed?

For a query like:

SELECT course, year, GROUPING(course) FROM courseSales GROUP BY CUBE(course, year) ORDER BY GROUPING(course)

Plan after ResolveReferences and before ResolveAggregateFunctions looks like:

!Sort [cast((shiftright(tempresolvedcolumn(spark_grouping_id#18L, spark_grouping_id, false), 1) & 1) as tinyint) AS grouping(course)#22 ASC NULLS FIRST], true                                                                                                                                                         
 +- Aggregate [course#19, year#20, spark_grouping_id#18L], [course#19, year#20, cast((shiftright(spark_grouping_id#18L, 1) & 1) as tinyint) AS grouping(course)#21 AS grouping(course)#15]                                                                                                                             
....

Because aggregate list has Alias(Alias(cast((shiftright(spark_grouping_id#18L, 1) & 1) as tinyint)) expression from SortOrder won't get matched as semantically equal and it will result in adding an unnecessary Project. By stripping inner aliases from aggregate list (that are going to get removed anyways in CleanupAliases) we can match SortOrder expression and resolve it as grouping(course)#15

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jul 1, 2025
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/fix_inner_aliases_semi_structured branch 2 times, most recently from bcd4f44 to f5504ef Compare July 2, 2025 00:43
@mihailotim-db mihailotim-db changed the title fix [SPARK-52649][SQL] Trim aliases before matching Sort/Having/Filter expressions in buildAggExprList Jul 2, 2025
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/fix_inner_aliases_semi_structured branch 4 times, most recently from 3b2f3b5 to b791c47 Compare July 2, 2025 05:34
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/fix_inner_aliases_semi_structured branch from b791c47 to 5ac3f1d Compare July 2, 2025 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants