Skip to content

HIVE-29029: When stack UDTF is used with UNION ALL throwing ClassCastException in CBO mode. #5978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mdayakar
Copy link
Contributor

HIVE-29029: When stack UDTF is used with UNION ALL throwing ClassCastException in CBO mode.

What changes were proposed in this pull request?

Ignore HiveUnionSimpleSelectsToInlineTableRule when stack UDTF is used with UNION ALL.

Why are the changes needed?

To fix the ClassCastException when stack UDTF is used with UNION ALL.

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn -Dtest=TestMiniLlapLocalCliDriver -Dqfile=udtf_with_unionall.q -pl itests/qtest -Pitests

Comment on lines +1 to +3
SELECT STACK(3,'A',10,date '2015-01-01','z','B',20,date '2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3)
UNION ALL
SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date '2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ClassCastException occurs during compilation so testing result correctness is slightly out of scope for the current PR. It is probably fine to add one end-to-end test but not sure if we need all of them. Does each test in this file contribute to additional test coverage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to make sure the results are also as expected, if not required I can remove.


SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1
UNION ALL
SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20, 'violet')) AS Colors2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need two VALUES branches to trigger the problem or one suffices? Do we need three rows on each VALUES?

Whatever is not needed to repro the problem should be dropped. Keeping tests minimal speeds-up reviews, improves readability, and minimizes flakiness in the longterm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I want to test some scenario like UNION ALL is applied for VALUES, for which rule will come into picture and converts two values into one INLINE udtf and STACK, for which rule should not be applied.

Comment on lines 21 to 39
SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02')))
UNION ALL
SELECT STACK(2,'X',10,date '2017-01-01','Y',20,date '2018-01-01');

EXPLAIN SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02')))
UNION ALL
SELECT STACK(2,'X',30,date '2017-01-01','Y',40,date '2018-01-01');

SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2015-02-02')))
UNION ALL
SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date '2016-02-02')));

EXPLAIN SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2015-02-02')))
UNION ALL
SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date '2016-02-02')));

SELECT EXPLODE(map('A',10,'B',20,'C',30))
UNION ALL
SELECT EXPLODE(map('X',70,'Y',80,'Z',90));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the fact that we are testing various UDTFs with UNION ALL but not sure under what criteria we picked the combinations to test. Is it exhaustive? Is it on most commonly used? Is it something else?

The fix is localized in a single rule so if we want to add test cases for this purpose it feels more appropriate to do it via unit tests (see Test*Rule.java classes) if that's feasible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed other UDTFs which are not required to be tested as a part of this PR.

Comment on lines 221 to 223
if (!(((RexCall)((HiveTableFunctionScan) input).getCall()).getOperands().get(0) instanceof RexCall)) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not very readable and it also makes various assumptions about the type and operands of TableFunctionScan#getCall that may lead to other ClassCastException or other problems. Maybe we should consider adding a helper function and putting all the necessary checks there. For instance:

private static boolean isInlineArrayCall(RexNode expr) {
    if (!(expr instanceof RexCall)) {
      return false;
    }
    RexCall call = (RexCall) expr;
    SqlOperator op = call.getOperator();
    if (!"inline".equalsIgnoreCase(op.getName())) {
      return false;
    }
    if (call.getOperands().size() != 1) {
      return false;
    }
    RexNode operand = call.getOperands().get(0);
    if (!(operand instanceof RexCall)) {
      return false;
    }
    call = (RexCall) operand;
    return "array".equalsIgnoreCase(call.getOperator().getName());
  }

It is a bit verbose but not sure if there is a better way to put it :/

Other than that I am not sure if we should make the check here or rather in Line 145-146 as you initially proposed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added required checks now with separate if conditions. We cannot use "inline" text here as for VALUES also it creates dummy table.

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants