Skip to content

refactor!: update ScanMetadata to struct with new FilteredEngineData type #768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sebastiantia
Copy link
Collaborator

@sebastiantia sebastiantia commented Mar 26, 2025

What changes are proposed in this pull request?

  1. Updated ScanMetata from typed tuple to struct. ScanMetadata is now a struct with fields:
  • filtered_data: A FilteredEngineData instance.
  • transforms: A vector of transformations to be applied to the data read from the files
  1. Introduction of FilteredEngineData type:
    Couples EngineData with a selection vector indicating which rows to process.
    This type is returned from thescan_metadata API and the incoming checkpoint API

  2. Updates visit_scan_files parameters to accept ScanMetadata to avoid de-structuring.

  3. Corresponding FFI changes for visit_scan_files to accept ScanMetadata param

All current tests pass.

Copy link

codecov bot commented Mar 26, 2025

Codecov Report

Attention: Patch coverage is 53.62319% with 32 lines in your changes missing coverage. Please review.

Project coverage is 84.88%. Comparing base (10bdee7) to head (a6bc8d1).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
ffi/src/scan.rs 0.00% 25 Missing ⚠️
kernel/src/scan/mod.rs 81.48% 0 Missing and 5 partials ⚠️
kernel/src/scan/log_replay.rs 85.71% 1 Missing ⚠️
kernel/src/scan/state.rs 90.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #768   +/-   ##
=======================================
  Coverage   84.87%   84.88%           
=======================================
  Files          83       83           
  Lines       20316    20311    -5     
  Branches    20316    20311    -5     
=======================================
- Hits        17244    17241    -3     
- Misses       2221     2222    +1     
+ Partials      851      848    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sebastiantia sebastiantia added the breaking-change Change that require a major version bump label Mar 26, 2025
@sebastiantia sebastiantia changed the title refactor!: make ScanData struct with new FilteredEngineData type refactor!: update ScanData to struct with new FilteredEngineData type Mar 26, 2025
@sebastiantia sebastiantia marked this pull request as ready for review March 26, 2025 20:09
.map(|res| {
let (data, vec, transforms) = res?;
let scan_data = res?;
let (data, sel_vec) = scan_data.filtered_data;
let scan_files = vec![];
state::visit_scan_files(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a visit_scan_data_files or similar that just takes the ScanData? Then we don't have to do this decomposition all over the place

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about updating visit_scan_files to just take ScanData?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed a little offline: i'm kinda partial to a ScanData.visit(callback, context)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented @zachschuermann 's approach

@sebastiantia sebastiantia requested a review from nicklan March 26, 2025 22:57
Copy link
Collaborator

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments. and we need to ensure all the breaking changes are clearly spelled out in PR description

.map(|res| {
let (data, vec, transforms) = res?;
let scan_data = res?;
let (data, sel_vec) = scan_data.filtered_data;
let scan_files = vec![];
state::visit_scan_files(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed a little offline: i'm kinda partial to a ScanData.visit(callback, context)?

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach LGTM. We just need to clean up all the nits.

@zachschuermann zachschuermann mentioned this pull request Apr 9, 2025
Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one small nit

Copy link
Collaborator

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after a quick rename

zachschuermann added a commit that referenced this pull request Apr 9, 2025
## What changes are proposed in this pull request?
Rename `ScanData` to `ScanMetadata` and `Scan::scan_data` to
`Scan::scan_metadata` (and corresponding FFI). Additionally, renames
`TableChangesScanData` to `TableChangesScanMetadata`. Additional
docs/refactor coming in #768

### This PR affects the following public APIs

breaking changes:
1. rename `ScanData` to `ScanMetadata`
2. rename `Scan::scan_data()` to `Scan::scan_metadata()`
3. (ffi) rename `free_kernel_scan_data()` to `free_scan_metadata_iter()`
4. (ffi) rename `kernel_scan_data_next()` to `scan_metadata_next()`
5. (ffi) rename `visit_scan_data()` to `visit_scan_metadata()`
6. (ffi) rename `kernel_scan_data_init()` to `scan_metadata_iter_init()`
7. (ffi) rename `KernelScanDataIterator` to `ScanMetadataIterator`
8. (ffi) rename `SharedScanDataIterator` to `SharedScanMetadataIterator`


## How was this change tested?
existing

resolves #816
@sebastiantia sebastiantia changed the title refactor!: update ScanData to struct with new FilteredEngineData type refactor!: update ScanMetadata to struct with new FilteredEngineData type Apr 9, 2025
@sebastiantia sebastiantia merged commit b38bc5d into delta-io:main Apr 10, 2025
20 of 21 checks passed
@sebastiantia sebastiantia deleted the seb/scan-data-refactor-with-filtered-engine-data branch April 10, 2025 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Change that require a major version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants