Skip to content

[Parquet][C++] Logical types with sort order UNKNOWN are missing null_count statistics #46205

Open
@paleolimbot

Description

@paleolimbot

Describe the enhancement requested

The C++ Parquet implementation after adding variant and geometry will have several logical types with a sort order of UNKNOWN. The current implementation of statistics will not calculate a null count and add that statistic to the column metadata if the sort order is unknown, so this particular piece of information will be missing for geometry, geography, and variant. For geometry in particular, it will be needed to effectively push down a query rectangle (or else there is no mechanism to detect completely null row groups).

I'm not sure what the best way is to implement this...for geometry specifically we could keep track of the null count in the GeoStatistics but this wouldn't help with variant. I'm also not sure if the null count + statistics should be written at the page level for these types or not.

Noted by @wgtmac in #45459

Component(s)

Parquet, C++

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions