[SPARK-52007] [SQL] Expression IDs shouldn't be present in grouping expressions when using grouping sets #50791

mihailoale-db · 2025-05-05T16:13:53Z

What changes were proposed in this pull request?

In this PR I propose that we change .toString to toPrettySQL when constructing grouping expressions in ResolveGroupingAnalytics rule.

Why are the changes needed?

Right now following query would pass (#x and #y are expression IDs generated with every cluster start):

select * from values(1,2) group by grouping sets (col1,col2,col1+col2) order by (col1#x + col2#y)``

But with next cluster restart, expression IDs would be regenerated and the query would fail. Because of that we need to fix this to disallow this nondeterministic behavior.

Does this PR introduce any user-facing change?

Some queries (and Dataframe programs) are going to fail but they would fail with every cluster restart (as explained above).

How was this patch tested?

Added tests.

Was this patch authored or co-authored using generative AI tooling?

No.

mihailoale-db · 2025-05-05T22:03:28Z

@cloud-fan could you PTAL when you have time (Docker test doesnt seem related)? Thanks

cloud-fan · 2025-05-06T12:07:04Z

yea the docker test is unrelated, thanks, merging to master!

github-actions bot added the SQL label May 5, 2025

initial commit

5a26cdb

mihailoale-db force-pushed the fixgroupingsetsschema branch from 5181779 to 5a26cdb Compare May 5, 2025 17:41

cloud-fan approved these changes May 6, 2025

View reviewed changes

cloud-fan closed this in 0eab1c0 May 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-52007] [SQL] Expression IDs shouldn't be present in grouping expressions when using grouping sets #50791

[SPARK-52007] [SQL] Expression IDs shouldn't be present in grouping expressions when using grouping sets #50791

mihailoale-db commented May 5, 2025

mihailoale-db commented May 5, 2025

cloud-fan commented May 6, 2025

[SPARK-52007] [SQL] Expression IDs shouldn't be present in grouping expressions when using grouping sets #50791

[SPARK-52007] [SQL] Expression IDs shouldn't be present in grouping expressions when using grouping sets #50791

Conversation

mihailoale-db commented May 5, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

mihailoale-db commented May 5, 2025

cloud-fan commented May 6, 2025