Skip to content

[Feature]: dynamic filtering on ZORDER-ed column should affect file pruning #4052

@ahirner

Description

@ahirner

Is your feature request related to a problem?

We have a table partionend on one column and ZORDER-ed by a target column. AFAICs, delta-rs #4323f9e5 is wired up correctly with datafusion version 51.

With an equality expression, results are excellent:

> explain analyze select id,type,source from r where scope_id = 973;
...
DeltaScan, metrics=[files_pruned=694, files_scanned=1]
...
1 row(s) fetched.
Elapsed 0.561 seconds.

With some source of dynamic filters, no files get pruned. However, the exact range of the dynamic predicate is acknowledged.

...
DeltaScan, metrics=[files_pruned=0, files_scanned=695]
...
DataSourceExec ... projection=[scope_id, source, type, id], file_type=parquet, predicate=true AND DynamicFilter [ scope_id@0 >= 973 AND scope_id@0 <= 973 ], pruning_predicate=scope_id_null_count@1 != row_count@2 AND scope_id_max@0 >= 973 AND scope_id_null_count@1 != row_count@2 AND scope_id_min@3 <= 973
...
1 row(s) fetched.
Elapsed 151.741 seconds.

Describe the solution you'd like

I'm not sure if it's fixable with datafusion-only changes. Let me know, so I can also file or search for the underlying issue.
Eventually, I do like to see delta's ZORDER utilized by dynamic filters.

Describe alternatives you've considered

Given the trend towards more dynamic filtering capabilities, I only see workarounds instead of true alternatives.
Amazing library btw thx!

Priority

Medium - Would be helpful

Additional context

Attached the raw explain analyze outcomes.

delta_rs.issue.delt.scan_dynamic_file_pruning.txt

Contribution

  • I'm willing to submit a pull request for this feature
  • I can help with testing this feature
  • I can help with documentation for this feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions