Skip to content

[bug] partial OVERWRITE operation writes the wrong snapshot summary metrics #1845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
kevinjqliu opened this issue Mar 25, 2025 · 0 comments
Open
1 of 3 tasks

Comments

@kevinjqliu
Copy link
Contributor

Apache Iceberg version

main (development)

Please describe the bug 🐞

Snapshot OVERWRITE operation can calculate the wrong summary fields when the table is partially updated.

update_snapshot_summaries assumes that all OVERWRITE operations are full table overwrite

truncate_full_table=self._operation == Operation.OVERWRITE,

if truncate_full_table and summary.operation == Operation.OVERWRITE and previous_summary is not None:
summary = _truncate_table_summary(summary, previous_summary)

This is likely an oversight when we implemented partial write.

Thankfully the table/transaction's overwrite function is currently implemented as a delete+append.

The only place where OVERWRITE operation is used is during partial deletes.

with self.update_snapshot(snapshot_properties=snapshot_properties).overwrite() as overwrite_snapshot:

Original thread apache/iceberg-go#356 (comment) (thanks @arnaudbriche and @zeroshade )

Partial overwrite reproduced in #1840

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant