Skip to content

Commit

Permalink
test: Port cdf tests from delta-spark to kernel (#611)
Browse files Browse the repository at this point in the history
## What changes are proposed in this pull request?
This PR adds several CDF tests from delta-spark. We check the following:
- CDF over various version ranges
- Update operations are read correctly from cdc files
- data_change=false means the action is skipped
- A range with start > end is an error.
- Start version greater than latest table version is an error 
- CDF works on partition tables
- CDF works on tables with backticks in the column names 
- CDF is correct in deletion cases with unconditional deletes,
conditional deletes that remove all rows, and selective conditional
deletes.

Table-changes construction is also changed so that CDF version error is
checked before snapshots are created. This makes the error message
clearer in the case that the start version is beyond the end of the
table.
  • Loading branch information
OussamaSaoudi-db authored Jan 15, 2025
1 parent c1c1dbe commit 606db20
Show file tree
Hide file tree
Showing 14 changed files with 324 additions and 13 deletions.
16 changes: 8 additions & 8 deletions kernel/src/table_changes/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,14 @@ impl TableChanges {
start_version: Version,
end_version: Option<Version>,
) -> DeltaResult<Self> {
let log_root = table_root.join("_delta_log/")?;
let log_segment = LogSegment::for_table_changes(
engine.get_file_system_client().as_ref(),
log_root,
start_version,
end_version,
)?;

// Both snapshots ensure that reading is supported at the start and end version using
// `ensure_read_supported`. Note that we must still verify that reading is
// supported for every protocol action in the CDF range.
Expand Down Expand Up @@ -173,14 +181,6 @@ impl TableChanges {
)));
}

let log_root = table_root.join("_delta_log/")?;
let log_segment = LogSegment::for_table_changes(
engine.get_file_system_client().as_ref(),
log_root,
start_version,
end_version,
)?;

let schema = StructType::new(
end_snapshot
.schema()
Expand Down
321 changes: 316 additions & 5 deletions kernel/tests/cdf.rs

Large diffs are not rendered by default.

Binary file not shown.
Binary file added kernel/tests/data/cdf-table-data-change.tar.zst
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified kernel/tests/data/cdf-table-non-partitioned.tar.zst
Binary file not shown.
Binary file added kernel/tests/data/cdf-table-partitioned.tar.zst
Binary file not shown.
Binary file added kernel/tests/data/cdf-table-simple.tar.zst
Binary file not shown.
Binary file added kernel/tests/data/cdf-table-update-ops.tar.zst
Binary file not shown.
Binary file modified kernel/tests/data/cdf-table-with-cdc-and-dvs.tar.zst
Binary file not shown.
Binary file modified kernel/tests/data/cdf-table-with-dv.tar.zst
Binary file not shown.
Binary file modified kernel/tests/data/cdf-table.tar.zst
Binary file not shown.

0 comments on commit 606db20

Please sign in to comment.