-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Improve push down limit (logical optimizer rule) #15744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The failing tests are related to topk (in the user_defined_plan.rs). Because the PR removes the limit during the first round, so I have a question, what's the difference between the |
d8bdbec
to
a8b64b8
Compare
IIUC, the topk in https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs is only used for test. |
Yes, now DataFusion don't have a |
@@ -137,6 +142,9 @@ impl OptimizerRule for PushDownLimit { | |||
} | |||
} else { | |||
sort.fetch = new_fetch; | |||
if skip == 0 && original_sort_fetch.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to add a comment to explain why && original_sort_fetch.is_none()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comment has triggered my deeper thinking, now I think we don't need the condition check
a8b64b8
to
80722ff
Compare
80722ff
to
d831173
Compare
@@ -102,362 +96,10 @@ use datafusion_physical_plan::execution_plan::{Boundedness, EmissionType}; | |||
use async_trait::async_trait; | |||
use futures::{Stream, StreamExt}; | |||
|
|||
/// Execute the specified sql and return the resulting record batches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we'll move the code about "how to write the user defined plan" to doc, so I moved the useless tests in the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the issue: #15774
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please remove the tests in some other PR so it is clear what behavior the code is changing, if any? I found it hard to find the actual code / behavior change in this PR with several different behaviors in there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
d831173
to
30c78ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xudong963
// ------ The implementation of the TopK code follows ----- | ||
|
||
#[derive(Debug)] | ||
#[derive(Debug, Default)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are removing all the tests that refer to this structure, I think we should remove the rest of the code too rather than making it as "allow unused"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, let's wait for the PR : #15832,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, let's wait for the PR : #15832,
let we finish it up as soon as possible , I think I was missing somethings and cant able to understand to them properly , it would be great help if you collaborate upon it @xudong963 . you can add your suggestions upon it adding to the PR
@@ -102,362 +96,10 @@ use datafusion_physical_plan::execution_plan::{Boundedness, EmissionType}; | |||
use async_trait::async_trait; | |||
use futures::{Stream, StreamExt}; | |||
|
|||
/// Execute the specified sql and return the resulting record batches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please remove the tests in some other PR so it is clear what behavior the code is changing, if any? I found it hard to find the actual code / behavior change in this PR with several different behaviors in there
Which issue does this PR close?
Rationale for this change
If skip is zero, we can directly remove the limit, the current behavior is to remove the limit at the second round optimization.
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?