Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare to support parquet row group skipping #381

Merged
merged 3 commits into from
Oct 9, 2024

Conversation

scovich
Copy link
Collaborator

@scovich scovich commented Oct 8, 2024

In preparation for #362 that actually implements parquet row group skipping, here we make various preparatory changes that can stand on their own:

  • Plumb the predicates through to the parquet readers, so that they can easily start using them
  • Add and use a new Expression::is_not_null helper that does what it says
  • Factor out replay_for_XXX methods, so that log replay involving push-down predicates can be tested independently.
  • Don't involve .json in log replay if .checkpoint.parquet is available

This should make both changes easier to review.

Copy link

codecov bot commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 89.28571% with 12 lines in your changes missing coverage. Please review.

Project coverage is 77.06%. Comparing base (340c5e4) to head (532865f).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/engine/default/parquet.rs 36.36% 6 Missing and 1 partial ⚠️
kernel/src/snapshot.rs 94.28% 0 Missing and 2 partials ⚠️
kernel/src/transaction.rs 93.93% 0 Missing and 2 partials ⚠️
kernel/src/scan/mod.rs 96.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #381      +/-   ##
==========================================
+ Coverage   76.86%   77.06%   +0.20%     
==========================================
  Files          47       47              
  Lines        9436     9524      +88     
  Branches     9436     9524      +88     
==========================================
+ Hits         7253     7340      +87     
- Misses       1789     1790       +1     
  Partials      394      394              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice thanks ryan, really like the new replay_for_* LGTM!

@@ -122,6 +122,7 @@ fn read_parquet_file_impl(
last_modified: file.last_modified,
size: file.size,
};
// TODO: Plumb the predicate through the FFI?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #382

Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks

@@ -744,7 +748,7 @@ fn predicate_on_number_with_not_null() -> Result<(), Box<dyn std::error::Error>>
"./tests/data/basic_partitioned",
Some(&["a_float", "number"]),
Some(Expression::and(
Expression::not(Expression::column("number").is_null()),
Expression::column("number").is_not_null(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so much nicer :)

@scovich scovich merged commit 4b602ae into delta-io:main Oct 9, 2024
13 checks passed
@scovich scovich deleted the row-group-skipping-prefactor branch November 8, 2024 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants