Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix:(transpile) use static alias for unnesting a struct #4313

Closed

Conversation

gauravsagar483
Copy link
Contributor

@gauravsagar483 gauravsagar483 commented Oct 30, 2024

Description:

Targeting these two threads on the sqlglot channel.

  1. https://tobiko-data.slack.com/archives/C0448SFS3PF/p1728537731625359
  2. https://tobiko-data.slack.com/archives/C0448SFS3PF/p1724147809318389

Spark/Hive LATERAL VIEW EXPLODE requires only single alias for the respective exploded column to be given unlike UNNEST in trino/presto, which can take multiple aliases for the single exploded column.

Docs


Hive LateralView | Spark LateralView | Presto UNNEST

update hive select transformer to support INLINE by default
update tests
@gauravsagar483 gauravsagar483 changed the title use static alias for unnesting a struct bugfix:(transpile) use static alias for unnesting a struct Oct 30, 2024
@georgesittas
Copy link
Collaborator

@gauravsagar483 please fix the style/linting by running make style.

@gauravsagar483
Copy link
Contributor Author

@gauravsagar483 please fix the style/linting by running make style.

Fixed with 2fbca6f Commit

Copy link
Collaborator

@VaggelisD VaggelisD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gauravsagar483,

Thanks for the PR! Did you test if the modified & new test cases run properly?

I left a couple comments to better understand the changes.

Comment on lines +374 to +378
# Replace multiple alias for single EXPLODE column with single static alias name: `t_struct`
if not has_multi_expr and (len(alias_cols) != 1):
alias_cols = ["t_struct"]

# [Optional] Do Update the Column reference in AST for table with current alias
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a single exploded column why would there be multiple aliases coming from Presto/Trino in the first place?

Copy link
Contributor Author

@gauravsagar483 gauravsagar483 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll share some examples in sometime for this. But in short, this is the scenario of array of Structs
thanks for the review.

Comment on lines 379 to 381
for column in expression.find_all(exp.Column):
if alias and column.table == alias.name:
column.set("table", "t_struct")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this optional? Wouldn't changing the alias require renaming the references?

Also, if alias is False here the find_all would run for no reason, it's better to move it outside of the for loop since it's not changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with this commit 52b41a0

Comment on lines -94 to +93
"hive": UnsupportedError,
"hive": r"""SELECT id, t.type, t.scores FROM example_table LATERAL VIEW INLINE(ARRAYS_ZIP(SPLIT(type, CONCAT('\\Q', ';', '\\E')), scores)) t AS type, scores""",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did these hive tests change? Did you test these queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. hive test has changed because hive do support array of struct using inline.

ref doc: LanguageManualUDF-inline- arrayofstructs

I'll check for some more scenarios regarding this, thanks for pointing it out.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is not INLINE in this case, though. It's ARRAYS_ZIP; IIRC, it doesn't support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, Let me run a few more testcases around hive.
If this has the issue you mentioned above I'll re-adjust the code. Thank you

@gauravsagar483 gauravsagar483 marked this pull request as draft October 30, 2024 10:54
@gauravsagar483
Copy link
Contributor Author

Moving the PR to draft state to validate the changes suggested above by @georgesittas.

@georgesittas
Copy link
Collaborator

Any updates on this? @gauravsagar483

@georgesittas
Copy link
Collaborator

Closing this due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants