-
Notifications
You must be signed in to change notification settings - Fork 90
refactor!: Remove redundant binary predicate operations #949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #949 +/- ##
==========================================
+ Coverage 85.06% 85.07% +0.01%
==========================================
Files 90 90
Lines 23090 23034 -56
Branches 23090 23034 -56
==========================================
- Hits 19641 19597 -44
+ Misses 2446 2435 -11
+ Partials 1003 1002 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c656483
to
8052975
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some commentary for reviewers
@@ -179,6 +179,15 @@ pub extern "C" fn visit_predicate_eq( | |||
visit_predicate_binary(state, BinaryPredicateOp::Equal, a, b) | |||
} | |||
|
|||
#[no_mangle] | |||
pub extern "C" fn visit_predicate_ne( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow this was missing all along. Not sure why e.g. duckdb didn't notice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this highlight a test gap on our side too for expr visitors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably... but it's not so much a test gap as it is a feature gap. Perhaps duckdb just passed NOT(eq) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DuckDB delta currently ignores the NotEqual predicates, likely because it was missing at the time. I've created an issue to track this here duckdb/duckdb-delta#203.
(Int8, Int8Type), | ||
(Int16, Int16Type), | ||
(Int32, Int32Type), | ||
(Int64, Int64Type), | ||
(UInt8, UInt8Type), | ||
(UInt16, UInt16Type), | ||
(UInt32, UInt32Type), | ||
(UInt64, UInt64Type), | ||
(Float16, Float16Type), | ||
(Float32, Float32Type), | ||
(Float64, Float64Type), | ||
(Timestamp(TimeUnit::Second, _), TimestampSecondType), | ||
(Timestamp(TimeUnit::Millisecond, _), TimestampMillisecondType), | ||
(Timestamp(TimeUnit::Microsecond, _), TimestampMicrosecondType), | ||
(Timestamp(TimeUnit::Nanosecond, _), TimestampNanosecondType), | ||
(Date32, Date32Type), | ||
(Date64, Date64Type), | ||
(Time32(TimeUnit::Second), Time32SecondType), | ||
(Time32(TimeUnit::Millisecond), Time32MillisecondType), | ||
(Time64(TimeUnit::Microsecond), Time64MicrosecondType), | ||
(Time64(TimeUnit::Nanosecond), Time64NanosecondType), | ||
(Duration(TimeUnit::Second), DurationSecondType), | ||
(Duration(TimeUnit::Millisecond), DurationMillisecondType), | ||
(Duration(TimeUnit::Microsecond), DurationMicrosecondType), | ||
(Duration(TimeUnit::Nanosecond), DurationNanosecondType), | ||
(Interval(IntervalUnit::DayTime), IntervalDayTimeType), | ||
(Interval(IntervalUnit::YearMonth), IntervalYearMonthType), | ||
(Interval(IntervalUnit::MonthDayNano), IntervalMonthDayNanoType), | ||
(Decimal128(_, _), Decimal128Type), | ||
(Decimal256(_, _), Decimal256Type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would have been an indentation-only change, except I removed the ArrowDataType::
prefix in order to keep fmt
at bay. Even so, the diff is much more readable when reviewing with whitespace changes hidden.
// IN is different from all the others, and also quite complex, so factor it out. | ||
// | ||
// TODO: Factor out as a stand-alone function instead of a closure? | ||
let eval_in = || match (left, right) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By pulling out this closure, I was able to eliminate the two "special" match arms for In
and NotIn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the TODO; should we just factor this out as a function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but all this in-list code will anyway change drastically when #652 lands.
LessThan => lt, | ||
LessThanOrEqual => lt_eq, | ||
GreaterThan => gt, | ||
GreaterThanOrEqual => gt_eq, | ||
Equal => eq, | ||
NotEqual => neq, | ||
Distinct => distinct, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The predicate/expression split missed this simplification
@@ -173,19 +173,6 @@ impl BinaryPredicateOp { | |||
Distinct | In | NotIn => false, // tolerates NULL input | |||
} | |||
} | |||
|
|||
/// Returns `<op2>` (if any) such that `B <op2> A` is equivalent to `A <op> B`. | |||
pub(crate) fn commute(&self) -> Option<BinaryPredicateOp> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turned out this (single-callsite) function added more complexity than it saved. It had marginal value all along, and it became downright confusing once the bloat of redundant operators went away.
(Ordering::Equal, true) => BinaryPredicateOp::NotEqual, | ||
(Ordering::Greater, false) => BinaryPredicateOp::GreaterThan, | ||
(Ordering::Greater, true) => BinaryPredicateOp::LessThanOrEqual, | ||
let pred_fn = match (ord, inverted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referencing the helper functions instead of raw operators is not only simpler now, but avoids having to rework the code once e.g. NotEqual
disappears.
(NotEqual, 7, vec![&batch2, &batch1]), | ||
(NotEqual, 8, vec![&batch2, &batch1]), | ||
#[allow(clippy::type_complexity)] // otherwise it's even more complex because no `_` | ||
let test_cases: Vec<(fn(Expr, Expr) -> _, _, _)> = vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, grabbing helper functions instead of raw predicates is both simpler and more robust.
Distinct | In => { | ||
debug!("Unsupported binary operator: {left:?} {op:?} {right:?}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: AFAIK nothing prevents us from implementing Distinct
(both scalar eval and data skipping forms).
We should probably implement it for completeness at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got an issue started! #963
/// A (possibly inverted) less-than-or-equal comparison, e.g. `<col> <= <value>` | ||
fn eval_pred_le(&self, col: &ColumnName, val: &Scalar, inverted: bool) -> Option<Self::Output>; | ||
/// A (possibly inverted) greater-than comparison, e.g. `<col> > <value>` | ||
fn eval_pred_gt(&self, col: &ColumnName, val: &Scalar, inverted: bool) -> Option<Self::Output>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out Ordering::Greater
had the right idea all along. It's less confusing to work with than <=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so everywhere we used to translate in terms of lt
/le
and now everything is in terms of lt
/gt
? agree feels nicer!
// Given `col <= val`: | ||
// Skip if `val` is less than _all_ values in [min, max], implies | ||
// Skip if `val < min AND val < max` implies | ||
// Skip if `val < min` implies | ||
// Keep if `NOT(val < min)` implies | ||
// Keep if `NOT(min > val)` | ||
self.partial_cmp_min_stat(col, val, Ordering::Greater, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: No actual change, the content of the if/else blocks just swapped places.
8052975
to
c609c19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -179,6 +179,15 @@ pub extern "C" fn visit_predicate_eq( | |||
visit_predicate_binary(state, BinaryPredicateOp::Equal, a, b) | |||
} | |||
|
|||
#[no_mangle] | |||
pub extern "C" fn visit_predicate_ne( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this highlight a test gap on our side too for expr visitors?
// IN is different from all the others, and also quite complex, so factor it out. | ||
// | ||
// TODO: Factor out as a stand-alone function instead of a closure? | ||
let eval_in = || match (left, right) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the TODO; should we just factor this out as a function?
Distinct | In => { | ||
debug!("Unsupported binary operator: {left:?} {op:?} {right:?}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got an issue started! #963
/// A (possibly inverted) less-than-or-equal comparison, e.g. `<col> <= <value>` | ||
fn eval_pred_le(&self, col: &ColumnName, val: &Scalar, inverted: bool) -> Option<Self::Output>; | ||
/// A (possibly inverted) greater-than comparison, e.g. `<col> > <value>` | ||
fn eval_pred_gt(&self, col: &ColumnName, val: &Scalar, inverted: bool) -> Option<Self::Output>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so everywhere we used to translate in terms of lt
/le
and now everything is in terms of lt
/gt
? agree feels nicer!
GreaterThan, | ||
GreaterThaneOrEqual, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and now one less typo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just one small suggestion.
What changes are proposed in this pull request?
The binary predicate operations
NotIn
,<=
,>=
and!=
are all redundant in kernel's invertible predicate evaluation framework: They are just inverted versions ofIn
,>
,<
, and=
(and inversion is just a flag passed to the predicate evaluator, not a physical operator). Remove them to simplify engine implementations.Note that the convenience methods for those operators (e.g.
Predicate::le
) are still present, as are the FFI kernel expression visitors. They're just implemented asNOT(<predicate>)
now.NOTE: The commits in this PR form a curated stack that reviewers will likely want to examine individually:
NotIn
and update tests>
instead of<=
<=
,>=
and!=
; and update testsThis PR affects the following public APIs
Removed several variants from
BinaryPredicateOp
Also removed the corresponding FFI engine expression visitors (because kernel would no longer call them even if they did exist).
How was this change tested?
Refactor. Existing unit tests validate the change.