Skip to content

Conversation

@harshmotw-db
Copy link
Contributor

@harshmotw-db harshmotw-db commented Jan 22, 2026

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR adds support for preserving variant stats from JSON AddFile records during checkpointing. Since stats are stored in Z85 encoding, they need to be decoded after the from_json expression. This PR implements the DecodeNestedZ85EncodedVariant expression for decoding variant stats from the Z85 representation.

We need to implement custom logic for stats stored as structs in previous checkpoints. However, currently, we don't preserve stats_struct for any data type, let alone variant. This PR fixes this for other data types. Once that PR goes in, I will make another PR to preserve stats from previous checkpoints.

TODO: Hide behind shims for Spark 4.0 compatibility

How was this patch tested?

Using golden files. We copied golden tables into this project where the AddFiles contain variant stats and tested that checkpointing on this table preserves variant stats.

Does this PR introduce any user-facing changes?

Yes

@harshmotw-db harshmotw-db force-pushed the harshmotw-db/variant_stats branch from 8497570 to 9114952 Compare January 23, 2026 22:24
@harshmotw-db harshmotw-db changed the title [DRAFT] Variant Stats [SPARK][VARIANT] Preserve Variant stats from JSON addFiles during checkpointing Jan 24, 2026
@harshmotw-db harshmotw-db marked this pull request as ready for review January 24, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant