Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify kernel AND/OR logic #110

Merged

Conversation

ryan-johnson-databricks
Copy link
Contributor

@ryan-johnson-databricks ryan-johnson-databricks commented Jan 27, 2024

The variadic AND/OR code was more complex than it needed to be. We can simplify by allowing them to be empty, with well-defined default values (AND() == TRUE and OR() == FALSE), and also by defining static xx_from methods for building up lists.

While we're at it, replace most uses of Expr::binary and Expr::variadic with direct convenience methods.

NOTE: In the spirit of "don't optimize" this PR intentionally does not optimize the various degenerate cases that can arise. These include, but are not limited to (OR cases are similar):

  • AND() --> TRUE
  • AND(x) --> x
  • AND(x..., FALSE, y...) --> FALSE
  • AND(x..., TRUE, y...) --> AND(x.., y...)
  • AND(x..., NULL, y...) --> AND(x..., y...)
  • AND(x..., AND(y...), z...) --> AND(x..., y..., z...)

There are probably others as well. Rather than (try to) implement all of those optimizations, or guess which subset of optimizations is "most valuable", we choose to not implement any of them for now.

Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change in LoC speaks for itself :).

👍

Comment on lines -290 to +307
fn mul(self, rhs: Expression) -> Self::Output {
fn mul(self, rhs: Expression) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is of course equivalent and less verbose. personally i have a slight preference for the more verbose syntax to favour consistency over brevity.

on the other hand this shows more clearly that is produces the same type ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I went through the same questions and ultimately decided that type Output = Self makes clear enough that we're returning the same type, and that Output is only needed to satisfy the trait. Does that sound reasonable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely!

@ryan-johnson-databricks ryan-johnson-databricks merged commit d09f814 into delta-io:main Jan 29, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants