Skip to content

Conversation

@nicktobey
Copy link
Contributor

@nicktobey nicktobey commented Jan 28, 2026

The analyzer tries to improve query plans by moving filter expressions to be directly above the table that they reference. Previously, this had a limitation in that it would treat references to aliases in Project nodes as opaque: if the alias expression references a table, the analysis wouldn't consider the filter to reference that table.

As a result, it wasn't possible to push a filter into multiple subqueries unless a previous optimization eliminated the Project node.

This PR enhances the analysis with the following steps:

  • Every alias on a Project node has a unique column id, so we walk the plan tree to build a map from Project column ids to the underlying expressions.
  • When computing which filter expressions can be pushed to which tables, we normalize the filter expressions by replacing GetFields on a Project with the underlying expression.
  • When pushing a Filter above a table or into a subquery, we also replace GetFields on a Project with the underlying expression.

Reasoning about the safety is tricky here. We should replace a GetField with the underlying expression if and only if we're actually moving the Filter beneath the Project.

The main concerns would be:

  • If a join plan has two Project nodes without an opaque node (like a SubqueryAlias) between them, then a Filter might only get pushed beneath one Project, but references to both Projects in the filter expression could get unwrapped.
  • If a join plan has a Project node in one child, and a Table in the other, then a filter expression could get pushed to be above the Table even if it references the Project

In practice I don't think the first concern can happen because it would require that the filter is getting pushed to some nameable-but-not-opaque node between the Projects, which I don't think exists.

The second concern requires that the project aliases an expression that doesn't reference any tables but can't be replaced with its underlying expression in a filter, and I don't think that's possible either. There may be some construct with a nondeterministic function that causes this?

@nicktobey nicktobey force-pushed the nicktobey/index_scan branch from 146b50e to 0ca2ef9 Compare January 28, 2026 08:59
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you've covered most of your bases here. I can't think of any additional edge cases other than the ones you've ruled out.

Some more tests are always a good idea though. I'm curious specifically about the behavior of pushing a filter composed of expressions on multiple tables being broken up and correctly pushed to their respective tables under a join or a subquery. I don't know if that transformation is even possible at the moment but tests verifying the existing behavior would be good. Also possible those are in the existing corpus, but good to verify.

// newFilterSet returns a new filter set that will track available filters with the filters and aliases given. Aliases
// are necessary to normalize expressions from indexes when in the presence of aliases.
func newFilterSet(filter sql.Expression, filtersByTable filtersByTable, tableAliases TableAliases) *filterSet {
func newFilterSet(filter sql.Expression, filtersByTable filtersByTable, tableAliases TableAliases, projectionExpressions map[sql.ColumnId]sql.Expression) *filterSet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably time to put params on their own lines

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be wise to add a couple small unit tests of this new beahvior


// GetProjectionExpressions walks a tree to identify all projections and returns a map from the column ids of those
// projections to the underlying expression.
func GetProjectionExpressions(n sql.Node) map[sql.ColumnId]sql.Expression {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary to export this?

// have been pushed down and need to be removed from the parent filter
filtersByTable := getFiltersByTable(n, scope)
filters := newFilterSet(n.Expression, filtersByTable, tableAliases)
projectionExpressions := GetProjectionExpressions(n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three funcs are all used in just these two places in this file, which suggests there's a missing method here to get the ultimate result (the filter set).

Maybe these three lines are newFilterSet and the existing newFilterSet is inlined?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants