-
-
Notifications
You must be signed in to change notification settings - Fork 240
Allow pushing filters past Project nodes #3400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
146b50e to
0ca2ef9
Compare
zachmu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you've covered most of your bases here. I can't think of any additional edge cases other than the ones you've ruled out.
Some more tests are always a good idea though. I'm curious specifically about the behavior of pushing a filter composed of expressions on multiple tables being broken up and correctly pushed to their respective tables under a join or a subquery. I don't know if that transformation is even possible at the moment but tests verifying the existing behavior would be good. Also possible those are in the existing corpus, but good to verify.
| // newFilterSet returns a new filter set that will track available filters with the filters and aliases given. Aliases | ||
| // are necessary to normalize expressions from indexes when in the presence of aliases. | ||
| func newFilterSet(filter sql.Expression, filtersByTable filtersByTable, tableAliases TableAliases) *filterSet { | ||
| func newFilterSet(filter sql.Expression, filtersByTable filtersByTable, tableAliases TableAliases, projectionExpressions map[sql.ColumnId]sql.Expression) *filterSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably time to put params on their own lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be wise to add a couple small unit tests of this new beahvior
|
|
||
| // GetProjectionExpressions walks a tree to identify all projections and returns a map from the column ids of those | ||
| // projections to the underlying expression. | ||
| func GetProjectionExpressions(n sql.Node) map[sql.ColumnId]sql.Expression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Necessary to export this?
| // have been pushed down and need to be removed from the parent filter | ||
| filtersByTable := getFiltersByTable(n, scope) | ||
| filters := newFilterSet(n.Expression, filtersByTable, tableAliases) | ||
| projectionExpressions := GetProjectionExpressions(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three funcs are all used in just these two places in this file, which suggests there's a missing method here to get the ultimate result (the filter set).
Maybe these three lines are newFilterSet and the existing newFilterSet is inlined?
The analyzer tries to improve query plans by moving filter expressions to be directly above the table that they reference. Previously, this had a limitation in that it would treat references to aliases in Project nodes as opaque: if the alias expression references a table, the analysis wouldn't consider the filter to reference that table.
As a result, it wasn't possible to push a filter into multiple subqueries unless a previous optimization eliminated the Project node.
This PR enhances the analysis with the following steps:
Reasoning about the safety is tricky here. We should replace a GetField with the underlying expression if and only if we're actually moving the Filter beneath the Project.
The main concerns would be:
In practice I don't think the first concern can happen because it would require that the filter is getting pushed to some nameable-but-not-opaque node between the Projects, which I don't think exists.
The second concern requires that the project aliases an expression that doesn't reference any tables but can't be replaced with its underlying expression in a filter, and I don't think that's possible either. There may be some construct with a nondeterministic function that causes this?