Skip to content

[EPIC] Simplify datetime predicates using "preimages" #19946

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

It is much easier for optimizers to reason about predicates of the form <col> op <constant> expressions. They often can't optimize anywhere near as well if they have a scalar function wrapping them

This includes DataFusion's PruningPredicate

For example the predicate looking for a particular year

WHERE EXTRACT (YEAR FROM k) = 2024

Can be rewritten as

k >= 2024-01-01 AND k < 2025-01-01.

And then k is easier to pushdown and subject to range analysis, etc.

The ClickHouse paper : https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf calles these "preimage" (from the mathematical term) for this rewrite (I think toYear(k) is the equivalent of EXTRACT(YEAR from k))

Second, some functions can compute the preimage of a given function result. This is used to replace comparisons of constants with function calls on the key columns by comparing the key column value with the preimage. For example, toYear(k) = 2024 can be replaced by k >= 2024-01-01 && k < 2025-01-01.

This ticket tracks adding such optimziations to DataFusion

Describe the solution you'd like

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EPICA larger project, actively underway, with sub tasksenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions