-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Apache Iceberg version
1.8.1
Query engine
Flink
Please describe the bug 🐞
In https://github.com/apache/iceberg/pull/10523/files, we changed the cleanup logic to stop fetching the latest snapshot from the metastore and instead maintain an in-memory snapshot instance for cleanup operations.
Specifically what we saw happen was:
- Initial Commit Attempt: Flink attempts to commit snapshot
<snapshot_id>to metastore. The commit succeeds on the metastore side, but Flink receives a transient network error and incorrectly marks the commit as failed. - Retry with Stale Metadata: RetryingMetaStoreClient retries the commit, but since the table has already been modified, metastore returns a
The table has been modifiederror. This triggers aCommitFailedException(see
https://github.com/apache/iceberg/blob/1.8.x/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L277-L278). - SnapshotProducer Retry: SnapshotProducer catches this exception and retries the operation. It reuses the same snapshot ID but generates a new manifest list file:
snap-<snapshot_id>-2-<uuid>.avro(note the incremented attempt number), different from the already-committed manifest listsnap-<snapshot_id>-1-<uuid>.avro. - No-Op Detection: Since there are no actual changes between these two attempts (same snapshot content), Iceberg detects this as a no-op and skips the commit https://github.com/apache/iceberg/blob/1.8.x/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L448-L453.
- Incorrect Cleanup: The cleanup logic then runs, but it incorrectly assumes
snap-<snapshot_id>-2-<uuid>.avrois the committed manifest list (since it's the most recent attempt). It therefore deletessnap-<snapshot_id>-1-<uuid>.avroas an "uncommitted" file, thereby corrupting the active snapshot
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working