Skip to content

Improve push down limit (logical optimizer rule) #15744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

xudong963
Copy link
Member

Which issue does this PR close?

  • Closes #.

Rationale for this change

If skip is zero, we can directly remove the limit, the current behavior is to remove the limit at the second round optimization.

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Apr 17, 2025
@xudong963
Copy link
Member Author

xudong963 commented Apr 17, 2025

    user_defined::user_defined_plan::topk_invariants
    user_defined::user_defined_plan::topk_invariants_after_invalid_mutation
    user_defined::user_defined_plan::topk_plan

The failing tests are related to topk (in the user_defined_plan.rs).

Because the PR removes the limit during the first round, so TopKOptimizerRule doesn't have a chance to replace limit + sort with Topk.

I have a question, what's the difference between the Sort(Topk) and Topk?

@xudong963 xudong963 force-pushed the improve_push_down_limit branch from d8bdbec to a8b64b8 Compare April 17, 2025 06:24
@xudong963 xudong963 changed the title Improve push down limit Improve push down limit (logical optimizer rule) Apr 17, 2025
@xudong963
Copy link
Member Author

Topk

IIUC, the topk in https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs is only used for test.

@2010YOUY01
Copy link
Contributor

Topk

IIUC, the topk in https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs is only used for test.

Yes, now DataFusion don't have a TopK execution plan, instead it's using an inner struct inside SortExecfor topk queries, and I think it's represented by Sort(topk) in explains.

@@ -137,6 +142,9 @@ impl OptimizerRule for PushDownLimit {
}
} else {
sort.fetch = new_fetch;
if skip == 0 && original_sort_fetch.is_none() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add a comment to explain why && original_sort_fetch.is_none()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment has triggered my deeper thinking, now I think we don't need the condition check

@xudong963 xudong963 force-pushed the improve_push_down_limit branch from 80722ff to d831173 Compare April 30, 2025 14:07
@@ -102,362 +96,10 @@ use datafusion_physical_plan::execution_plan::{Boundedness, EmissionType};
use async_trait::async_trait;
use futures::{Stream, StreamExt};

/// Execute the specified sql and return the resulting record batches
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we'll move the code about "how to write the user defined plan" to doc, so I moved the useless tests in the PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the issue: #15774

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please remove the tests in some other PR so it is clear what behavior the code is changing, if any? I found it hard to find the actual code / behavior change in this PR with several different behaviors in there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

@xudong963 xudong963 force-pushed the improve_push_down_limit branch from d831173 to 30c78ff Compare April 30, 2025 14:30
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xudong963

// ------ The implementation of the TopK code follows -----

#[derive(Debug)]
#[derive(Debug, Default)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are removing all the tests that refer to this structure, I think we should remove the rest of the code too rather than making it as "allow unused"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, let's wait for the PR : #15832,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, let's wait for the PR : #15832,

let we finish it up as soon as possible , I think I was missing somethings and cant able to understand to them properly , it would be great help if you collaborate upon it @xudong963 . you can add your suggestions upon it adding to the PR

@@ -102,362 +96,10 @@ use datafusion_physical_plan::execution_plan::{Boundedness, EmissionType};
use async_trait::async_trait;
use futures::{Stream, StreamExt};

/// Execute the specified sql and return the resulting record batches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please remove the tests in some other PR so it is clear what behavior the code is changing, if any? I found it hard to find the actual code / behavior change in this PR with several different behaviors in there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants