You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@sopel39 adding to the above this can we can also store null_counts. see more detailed discussion here.
Null counts which are stored in the partition stats can be scaled during run time (or otherwise on the fly collection can be used).
Proposed Change
At the moment https://iceberg.apache.org/spec/#partition-statistics doesn't contain min/max stats per column. Because of that engines (e.g: https://github.com/trinodb/trino/blob/master/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java#L158) need to read manifests files to compute min/max stats per column. Keeping min/max stats at partition level would allow to save time on enumerating manifest files during planning. This is especially important with high concurrency queries and on large scale tables.
Proposal document
No response
Specifications
The text was updated successfully, but these errors were encountered: