Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Coral-Hive] Simplify IN operator #410

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

wmoustafa
Copy link
Contributor

@wmoustafa wmoustafa commented May 19, 2023

What changes are proposed in this pull request, and why are they necessary?

This PR simplifies how Coral handles the expr IN (values) operator.

Previously, the following set of transformations took place:
1- Hive's ParseTreeBuilder processes the IN operator as part of visitFunctionInternal, specifically, passing the list of (expr, value_0, value_1, ...) as operands.
2- This maps to HiveInOperator when looked up from the function registry. This operator overrides createCall in such a way that makes the operands be on the format (expr, values) (i.e., instead of N operands, two operands are used with the second being the list of values).
3- HiveConvertletTable for IN operator converts IN SqlNode to a RexNode with operands on the format (expr, value_0, value_1, ...) again.

This PR sticks to the operand format (expr, value_0, value_1, ...) across the board, simplifying the back and forth transformations, and eliminating the need for the HiveConvertletTable in the IN operator case.

How was this patch tested?

  • ./gradlew build
  • Added Spark test case.
  • Will wait for integration tests.

@wmoustafa wmoustafa changed the title [Coral-Hive] Simply IN operator [Coral-Hive] Simplify IN operator May 19, 2023
Copy link
Contributor

@aastha25 aastha25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @wmoustafa !
Please also update the PR description with the results from i-tests when the results are available.


// Record the RHS type for use by SqlToRelConverter.
((SqlValidatorImpl) validator).setValidatedNodeType(nodeList, rightType);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the else part not required anymore because for IN (query) type SQLs, we use Calcite's IN operator here?

Could you also add a UT / point me to a UT which validates that type derivation won't fail for those SQLs due to this change?

Copy link
Contributor

@KevinGe00 KevinGe00 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right is now explicitly built as a SqlNodeList using the list of operands, so there is no case where right isn't an instance of SqlNodeList.

https://github.com/linkedin/coral/pull/410/files#diff-0e69daa34dc3fe4e131718bf050d0cfd47dde9e79921e53e6ccda1d78aeefe17R83-R86

@@ -47,6 +48,8 @@
* different from set of OR predicates
* 2. In some cases, calcite turns IN clause to INNER JOIN query on a set of values. We
* again want to prevent this conversion.
* Unlike Calcite's IN operator, whose operands are (expr, values), this operator's operands
* are (expr, values[0], values[1], ...), where (exp) is the expression preceding IN operator.
*/
public class HiveInOperator extends SqlSpecialOperator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of refactoring this operator, can you also rename this operator to CoralINOperator?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants