-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix:(transpile) use static alias for unnesting a struct #4313
bugfix:(transpile) use static alias for unnesting a struct #4313
Conversation
update hive select transformer to support INLINE by default update tests
@gauravsagar483 please fix the style/linting by running |
Fixed with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @gauravsagar483,
Thanks for the PR! Did you test if the modified & new test cases run properly?
I left a couple comments to better understand the changes.
# Replace multiple alias for single EXPLODE column with single static alias name: `t_struct` | ||
if not has_multi_expr and (len(alias_cols) != 1): | ||
alias_cols = ["t_struct"] | ||
|
||
# [Optional] Do Update the Column reference in AST for table with current alias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a single exploded column why would there be multiple aliases coming from Presto/Trino in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll share some examples in sometime for this. But in short, this is the scenario of array of Structs
thanks for the review.
sqlglot/transforms.py
Outdated
for column in expression.find_all(exp.Column): | ||
if alias and column.table == alias.name: | ||
column.set("table", "t_struct") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this optional? Wouldn't changing the alias require renaming the references?
Also, if alias
is False
here the find_all
would run for no reason, it's better to move it outside of the for
loop since it's not changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with this commit 52b41a0
"hive": UnsupportedError, | ||
"hive": r"""SELECT id, t.type, t.scores FROM example_table LATERAL VIEW INLINE(ARRAYS_ZIP(SPLIT(type, CONCAT('\\Q', ';', '\\E')), scores)) t AS type, scores""", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did these hive tests change? Did you test these queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. hive test has changed because hive
do support array of struct using inline.
ref doc: LanguageManualUDF-inline- arrayofstructs
I'll check for some more scenarios regarding this, thanks for pointing it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is not INLINE
in this case, though. It's ARRAYS_ZIP
; IIRC, it doesn't support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, Let me run a few more testcases around hive
.
If this has the issue you mentioned above I'll re-adjust the code. Thank you
Moving the PR to draft state to validate the changes suggested above by @georgesittas. |
Any updates on this? @gauravsagar483 |
Closing this due to inactivity. |
Description:
Targeting these two threads on the sqlglot channel.
Spark/Hive LATERAL VIEW EXPLODE requires only single alias for the respective exploded column to be given unlike UNNEST in trino/presto, which can take multiple aliases for the single exploded column.
Docs
Hive LateralView | Spark LateralView | Presto UNNEST