-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
This issue covers two related filter push-down improvements.
Pass previously pushed filters to supports_filters_pushdown
Currently, the optimization does not pass filters that were pushed in a previous run (TableScan::filters) to TableProvider::supports_filters_pushdown(...).
If the optimizer runs multiple times, it may try to push filters into the table provider multiple times. In our DataFusion-based project, supports_filters_pushdown(...) has context-dependent behavior: the provider supports any single filter like column = value, but not multiple such filters at the same time.
Consider the following optimizer pipeline pattern:
- Try to push
a = 1, b = 1.
supports_filters_pushdownreturns[Exact, Inexact]
OK: the optimizer records thata = 1is pushed and creates a filter node forb = 1.
...
Another optimization iteration.
- Try to push b = 1.
supports_filters_pushdownreturns[Exact]. Of course, the table provider can’t remember
all previously pushed filters, so it has no choice but to answerExact.
Now, the optimizer thinks the conjunctiona = 1 AND b = 1is supported exactly, but it is not.
To prevent this problem, I suggest passing filters that were already pushed into the scan earlier to supports_filters_pushdown(...).
Do not assume that filter support decision is stable
Consider the next scenario:
-
supports_filters_pushdownreturnsExacton some filter, e.g. "a = 1", where column "a" is not
required by the query projection. -
"a" is removed from the table provider projection by "optimize projection" rule.
-
supports_filters_pushdownchanges a decision and returnsInexacton this filter the next time.
For example, input filters were changed and it prefers to use a new one. -
"a" is not returned to the table provider projection which leads to filter that references a column which is
not a part of the schema.
Suggest to extend logic with the following actions:
-
Collect columns that are not used in the current table provider projection, but required for filter
expressions. Call itadditional_projection. -
If
additional_projectionis empty -- leave all as is. -
Otherwise extend a table provider projection and wrap a plan with an additional projection node
to preserve schema used prior to this rule.