Commit 59d3b03
authored
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-incubator/delta-kernel-rs/blob/main/CONTRIBUTING.md
2. Run `cargo t --all-features --all-targets` to get started testing,
and run `cargo fmt`.
3. Ensure you have added or run the appropriate tests for your PR.
4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
5. Be sure to keep the PR description updated to reflect all changes.
-->
<!--
PR title formatting:
This project uses conventional commits:
https://www.conventionalcommits.org/
Each PR corresponds to a commit on the `main` branch, with the title of
the PR (typically) being
used for the commit message on main. In order to ensure proper
formatting in the CHANGELOG please
ensure your PR title adheres to the conventional commit specification.
Examples:
- new feature PR: "feat: new API for snapshot.update()"
- bugfix PR: "fix: correctly apply DV in read-table example"
-->
## What changes are proposed in this pull request?
<!--
Please clarify what changes you are proposing and why the changes are
needed.
The purpose of this section is to outline the changes, why they are
needed, and how this PR fixes the issue.
If the reason for the change is already explained clearly in an issue,
then it does not need to be restated here.
1. If you propose a new API or feature, clarify the use case for a new
API or feature.
2. If you fix a bug, you can clarify why it is a bug.
-->
### Motivation and Context
This is part of the parent-issue to provide checkpoint write support in
`delta-kernel-rs` #499. This PR introduces a new snapshot API for
writing **single-file checkpoints** to Delta tables (#736), supporting
single-file V1 and V2 spec classic-named checkpoint formats.
### Breaking Changes
- Introduces a new error variant that may require handling in downstream
code.
### Summary of Changes
- **New Checkpoint API**: Adds `Snapshot::checkpoint` &
`Table::checkpoint` as the entry points for checkpoint creation,
supporting both V1 and V2 checkpoint specs depending on table features.
- **CheckpointWriter**: Introduces a core orchestrator for checkpoint
creation, including data preparation and (future) finalization -
todo(#850)
- **CheckpointDataIterator**: An iterator-based mechanism for checkpoint
data generation, ensuring accurate action statistics and safe
finalization.
- **Explicit Finalization**: Lays groundwork for a two-step checkpoint
process. First write all data, then call `.finalize()` to persist
checkpoint metadata - todo(#850).
- **Error Handling**: Adds a new error variant for checkpoint write
failures.
- **Extensibility**: The API is designed to accommodate future
enhancements, ( multi-file V2 checkpoints )
### Major Components and APIs
| API / Struct | Description |
|--------------------------------------|-------------------------------------------------------------------------------------------------------|
| `Error::CheckpointWrite(String)` | New error variant for checkpoint
write failures |
| `Snapshot::checkpoint()` | Creates a new `CheckpointWriter` for a
snapshot |
| `Table::checkpoint()` | Creates a new `CheckpointWriter` for a version
of a table |
| `CheckpointWriter` | Main class orchestrating checkpoint creation and
finalization |
| `CheckpointWriter::checkpoint_path()`| Returns the URL where the
checkpoint file should be written |
| `CheckpointWriter::checkpoint_data()`| Returns the checkpoint data
(`CheckpointDataIterator`) to be written to the checkpoint file |
| `CheckpointWriter::finalize()` | todo(#850) Finalizes checkpoint by
writing the `_last_checkpoint` file after data is persisted |
| `CheckpointDataIterator` | Iterator over checkpoint actions,
accumulates statistics for finalization |
| `CheckpointBatch` (private) | Output of
`CheckpointLogReplayProcessor`, contains filtered actions and counts |
### Checkpoint Types
| Table Feature | Resulting Checkpoint Type |
|------------------|------------------------------|
| No v2Checkpoints | Single-file Classic-named V1 |
| v2Checkpoints | Single-file Classic-named V2 |
- **V1**: For legacy tables, no `CheckpointMetadata` action included.
- **V2**: For tables supporting `v2Checkpoints`, includes
`CheckpointMetadata` action for enhanced metadata.
### Usage Workflow
1. Create a `CheckpointWriter` via `Snapshot::checkpoint` or
`Table::checkpoint`
2. Retrieve the checkpoint path from
`CheckpointWriter::checkpoint_path()`
3. Retrieve the checkpoint data from
`CheckpointWriter::checkpoint_data()`
4. Write the data to path in object storage (engine-specific)
5. Finalize the checkpoint by calling `CheckpointWriter.finalize()` -
todo(#850)
todo(#850): The `CheckpointWriter::finalize()` API that was previously
included in this PR has been split into a separate PR #851 for ease of
review. Handle the finalization of the checkpointing process by writing
the `_last_checkpoint` file on call to`.finalize()`. Note: we require
the engine to write the entire checkpoint file to storage before calling
`.finalize()`, otherwise the table may be corrupted. It will be hard for
the engine **not** to do this since the `finalize()` call takes the
`FileMeta` of the checkpoint write
<!--
Uncomment this section if there are any changes affecting public APIs:
### This PR affects the following public APIs
If there are breaking changes, please ensure the `breaking-changes`
label gets added by CI, and describe why the changes are needed.
Note that _new_ public APIs are not considered breaking.
-->
## How was this change tested?
<!--
Please make sure to add test cases that check the changes thoroughly
including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please
clarify how you tested, ideally via a reproducible test documented in
the PR description.
-->
Unit tests in `checkpoint/mod.rs`
`test_deleted_file_retention_timestamp` - tests file retention timestamp
calculations
`test_create_checkpoint_metadata_batch`
Unit tests in `checkpoint/tests.rs`
`test_v1_checkpoint_latest_version_by_default`: table that does not
support `v2Checkpoint`, no checkpoint version specified
`test_v1_checkpoint_specific_version`: table that does not support
`v2Checkpoint`, checkpointing at a specific version
`test_v2_checkpoint_supported_table`: table that supports `v2Checkpoint`
& no version is specified
1 parent 23e65d3 commit 59d3b03
File tree
9 files changed
+864
-64
lines changed- ffi/src
- kernel/src
- checkpoint
9 files changed
+864
-64
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
56 | 57 | | |
57 | 58 | | |
58 | 59 | | |
| |||
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
| 65 | + | |
64 | 66 | | |
65 | 67 | | |
66 | 68 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
29 | | - | |
30 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
41 | | - | |
42 | | - | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | 49 | | |
54 | 50 | | |
55 | 51 | | |
56 | 52 | | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | 53 | | |
65 | 54 | | |
66 | 55 | | |
| |||
71 | 60 | | |
72 | 61 | | |
73 | 62 | | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
74 | 82 | | |
75 | | - | |
| 83 | + | |
76 | 84 | | |
77 | 85 | | |
78 | | - | |
| 86 | + | |
| 87 | + | |
79 | 88 | | |
80 | 89 | | |
81 | 90 | | |
| |||
100 | 109 | | |
101 | 110 | | |
102 | 111 | | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | 112 | | |
114 | 113 | | |
115 | 114 | | |
116 | 115 | | |
117 | | - | |
| 116 | + | |
118 | 117 | | |
119 | 118 | | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
120 | 125 | | |
121 | 126 | | |
122 | 127 | | |
| |||
127 | 132 | | |
128 | 133 | | |
129 | 134 | | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
| 135 | + | |
136 | 136 | | |
137 | 137 | | |
138 | | - | |
139 | | - | |
140 | 138 | | |
141 | 139 | | |
142 | 140 | | |
| |||
463 | 461 | | |
464 | 462 | | |
465 | 463 | | |
| 464 | + | |
| 465 | + | |
466 | 466 | | |
467 | 467 | | |
468 | 468 | | |
469 | | - | |
| 469 | + | |
| 470 | + | |
470 | 471 | | |
471 | 472 | | |
472 | 473 | | |
| |||
478 | 479 | | |
479 | 480 | | |
480 | 481 | | |
481 | | - | |
482 | | - | |
483 | | - | |
484 | | - | |
485 | | - | |
486 | | - | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
491 | | - | |
492 | | - | |
493 | | - | |
494 | | - | |
495 | | - | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
496 | 493 | | |
497 | | - | |
498 | 494 | | |
499 | 495 | | |
500 | 496 | | |
| |||
0 commit comments