Skip to content

Conversation

scovich
Copy link
Collaborator

@scovich scovich commented Oct 8, 2024

In preparation for #362 that actually implements parquet row group skipping, here we make various preparatory changes that can stand on their own:

  • Plumb the predicates through to the parquet readers, so that they can easily start using them
  • Add and use a new Expression::is_not_null helper that does what it says
  • Factor out replay_for_XXX methods, so that log replay involving push-down predicates can be tested independently.
  • Don't involve .json in log replay if .checkpoint.parquet is available

This should make both changes easier to review.

Copy link

codecov bot commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 89.28571% with 12 lines in your changes missing coverage. Please review.

Project coverage is 77.06%. Comparing base (340c5e4) to head (532865f).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/engine/default/parquet.rs 36.36% 6 Missing and 1 partial ⚠️
kernel/src/snapshot.rs 94.28% 0 Missing and 2 partials ⚠️
kernel/src/transaction.rs 93.93% 0 Missing and 2 partials ⚠️
kernel/src/scan/mod.rs 96.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #381      +/-   ##
==========================================
+ Coverage   76.86%   77.06%   +0.20%     
==========================================
  Files          47       47              
  Lines        9436     9524      +88     
  Branches     9436     9524      +88     
==========================================
+ Hits         7253     7340      +87     
- Misses       1789     1790       +1     
  Partials      394      394              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice thanks ryan, really like the new replay_for_* LGTM!

last_modified: file.last_modified,
size: file.size,
};
// TODO: Plumb the predicate through the FFI?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #382

Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks

Some(&["a_float", "number"]),
Some(Expression::and(
Expression::not(Expression::column("number").is_null()),
Expression::column("number").is_not_null(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so much nicer :)

@scovich scovich merged commit 4b602ae into delta-io:main Oct 9, 2024
13 checks passed
@scovich scovich deleted the row-group-skipping-prefactor branch November 8, 2024 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants