-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51046][SQL][TEST] Reduce numCols
in withFilter()
to prevent SubExprEliminationBenchmark
from failing due to a Codegen error
#49938
base: master
Are you sure you want to change the base?
Conversation
Benchmark jdk17: https://github.com/wayneguow/spark/actions/runs/13310975547 |
subExprElimination true, codegen: false 2053 2079 33 0.0 20526629.8 3.5X | ||
subExprElimination false, codegen: true 2474 2512 44 0.0 24744107.3 1.0X | ||
subExprElimination false, codegen: false 2231 2246 20 0.0 22306061.2 1.1X | ||
subExprElimination true, codegen: true 2408 2509 100 0.0 24084091.2 1.0X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What confuses me is that when subExprElimination
set to true, codegen
set to true, there is an obvious regression in performance compared to before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's true. The performance regression was the actual root cause why we didn't make a decision. Here, FYI.
cc @dongjoon-hyun , Could you take a look when you have some time? maybe we can have some discussion. |
cc @panbingkun , @cloud-fan , @LuciferYang It seems that we need to make a decision. Are we good with this codeine perf regression of |
BTW, Thank you for your active contributions, @wayneguow ! |
Happy to contribute!😀 Keep going! |
@panbingkun Can #49573 be completed before the 4.0 release? If so, we can wait until the optimization is finished before refreshing this result. Additionally, if #49573 is completed, will we still need to change |
What changes were proposed in this pull request?
This PR aims to reduce
numCols
inwithFilter()
to preventSubExprEliminationBenchmark
from failing due to a Codegen error.I did some debug investigation and found that in the current master branch, the codegen code has a huge
processNext
method, but in branch-3.5, it is split into many small methods. Because the logic of codegen is different, the previous benchmark cannot run normally.master branch:
3.5 branch:
Why are the changes needed?
If we run
SubExprEliminationBenchmark
:It fails at
CodeGenerator
, details:The root cause is:
Does this PR introduce any user-facing change?
No, just fix a test benchmark failure.
How was this patch tested?
Run benchmark tests manually.
Was this patch authored or co-authored using generative AI tooling?
No.