You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When constructing a LogSegment, we scan the _delta_log directory for checkpoints and commits. The most recent checkpoint is collected from the log, but the kernel does not check that it has collected a full multi-part checkpoint. Thus, the checkpoint returned may be incomplete.
To Reproduce
This behaviour can be seen in the LogSegment test build_snapshot_with_missing_checkpoint_part_no_hint.
Expected behavior
build_snapshot_with_missing_checkpoint_part_no_hint should pass and return the most recent complete checkpoint at version 3.
Additional context
No response
The text was updated successfully, but these errors were encountered:
After a deeper dive into Delta Spark, here’s a summary of their current behavior regarding checkpointing:
with no _last_checkpoint & an incomplete Multi-Part Checkpoint, Delta Spark leverages prior complete checkpoints to recover. This behavior matches the behavior in the failing test in the kernel which will be addressed.
with _last_checkpoint referencing an incomplete Multi-Part Checkpoint/missing checkpoint, Delta Spark raises an Error. This behavior also matches the current behavior in the kernel.
Since Snapshots can still be constructed correctly even when the referenced checkpoint in scenario (2) is corrupted or missing, we may update the behavior to attempt recovery instead of raising an error. The updated logic will leverage earlier complete checkpoint versions if available.
[UPDATE]
The _last_checkpoint file is only a hint, so technically it's not required that the checkpoint it referenced still exists (e.g. due to metadata cleanup deleting it).
HOWEVER, the file IS reliable, in the sense that a complete checkpoint must have existed at some point
We should always prefer a newer checkpoint, if available, for performance reasons
If NO complete checkpoint is available at or after the hint, that means the table has been corrupted (most likely by user deleting and recreating the table in place, so the new highest version number is lower than the checkpoint hint)
The kernel should NOT try to compensate for such badness and instead fail-fast.
TLDR: No additional changes will be introduced apart from the the behavior in the issue description, scenario (1).
Describe the bug
When constructing a
LogSegment
, we scan the_delta_log
directory for checkpoints and commits. The most recent checkpoint is collected from the log, but the kernel does not check that it has collected a full multi-part checkpoint. Thus, the checkpoint returned may be incomplete.To Reproduce
This behaviour can be seen in the LogSegment test
build_snapshot_with_missing_checkpoint_part_no_hint
.Expected behavior
build_snapshot_with_missing_checkpoint_part_no_hint
should pass and return the most recent complete checkpoint at version 3.Additional context
No response
The text was updated successfully, but these errors were encountered: