-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paquet and JSON readers use Arc<Expression> to avoid deep copies #364
Conversation
@@ -79,7 +80,7 @@ fn as_inverted_data_skipping_predicate(expr: &Expr) -> Option<Expr> { | |||
as_data_skipping_predicate(&expr) | |||
} | |||
VariadicOperation { op, exprs } => { | |||
let expr = Expr::variadic(op.invert(), exprs.iter().cloned().map(Expr::not)); | |||
let expr = Expr::variadic(op.invert(), exprs.iter().cloned().map(|e| !e)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we actually bothered to impl std::ops::Not for Expression
, we may as well use it!
(Expr::not
is really just shorthand for std::ops::Not::not
, and requires the appropriate import. By contrast,!
is compiler magic and does not require any import)
None => return None, | ||
}; | ||
|
||
let predicate = predicate.as_deref()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deref is black magic, but it's sure nice when it works!
(trying just predicate?
gives a compilation error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Try isn't implemented on &Option
as_deref converts &Option<T>
to Option<&T>
https://doc.rust-lang.org/std/option/enum.Option.html#method.as_deref
@@ -467,7 +464,7 @@ fn transform_to_logical_internal( | |||
.get_expression_handler() | |||
.get_evaluator( | |||
read_schema, | |||
read_expression.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One copy saved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that this matches what arrow does with their various XRef
types, I'm inclined to think yeah this is a good idea.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #364 +/- ##
==========================================
- Coverage 78.25% 78.22% -0.04%
==========================================
Files 49 49
Lines 10256 10253 -3
Branches 10256 10253 -3
==========================================
- Hits 8026 8020 -6
- Misses 1780 1783 +3
Partials 450 450 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great just added a couple questions of my own :)
@@ -467,7 +464,7 @@ fn transform_to_logical_internal( | |||
.get_expression_handler() | |||
.get_evaluator( | |||
read_schema, | |||
read_expression.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, one question that zach had too
Today, the engine parquet/json file handler APIs take an
Expression
arg for predicate pushdown. They cannot take a reference, because the iterator they return will likely depend on (but outlive) that reference. Worse, they need to do it for every file the query reads. Unfortunately, data skipping predicates can be arbitrarily large, and thus annoying/expensive to copy so much. We already use Arc to protect schemas (some of the time, at least), and we can start using Arc to protect expressions as well.