-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: parse analyze compute statistics #4547
Conversation
Hi @zashroof, thanks for the PR. Can you please share any related documentation? What dialects does this cover? |
Sorry about that, updated the PR description with links. |
FYI, the team's off for holidays, we'll take a look in a week or so. Thanks for providing those links. :) |
No worries, I didn't expect a review during the holidays :) Happy Holidays! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @zashroof, thank you for the PR!
To my knowledge, there are other dialects that support the ANALYZE
statement e.g Postgres; If we implement parsing for it, we should make sure that:
- We add support for
ANALYZE
across all dialects - If (1) has a large scope, we add
exp.Command
fallbacks at any point that the Spark/Databricks syntax is not met.
Otherwise, we risk introducing regressions such as incomplete parsing/generation or errors for other dialects. Check out how self._parse_as_command(...)
is used for other statements as well.
Skimming through currently supported dialects docs to see if they define an analyze statement (There is no Well tbh this seems a little more than I anticipated. Multiple are already covered by spark impl but let me see if I can add test cases for a few more. In the mean time I should add fallback to parse as command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments as well, should be good to go once we have coverage for the remaining dialects, as Vaggelis said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, looks much cleaner!
kind = None | ||
this: t.Optional[exp.Expression] = None | ||
partition = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, could we do kind = self._curr and self._curr.text.upper()
here i.e before the branches? I think that would remove the hardcoded values in the if/elif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-wrote this part in #4591
@@ -410,6 +410,8 @@ class TokenType(AutoName): | |||
OPTION = auto() | |||
SINK = auto() | |||
SOURCE = auto() | |||
ANALYZE = auto() | |||
COMPUTE_STATISTICS = auto() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can now remove the COMPUTE STATISTICS
token since it was removed from STATEMENT_PARSERS
, right?
It can be consumed by the parser through self._match_text_seq("COMPUTE", "STATISTICS")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ast = parse_one("ANALYZE TABLE tbl COMPUTE STATISTICS FOR ALL COLUMNS") | ||
self.assertIsInstance(ast, exp.Analyze) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're not passing in a specific dialect to parse_one
, afaict we can merge each of these 2 lines as:
self.validate_identity(...).assert_is(exp.Command)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea, I am removing this test here and adding assert_is(exp.Analyze)
to all dialect specefic test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I then removed it when parse_analyze only returns exp.Analyze. Changes in #4591
|
||
class ComputeStatistics(Expression): | ||
arg_types = { | ||
"this": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll always have this
here according to _parse_compute_statistics
, we can make this True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #4591
self.validate_identity( | ||
"ANALYZE TABLE ctlg.db.tbl PARTITION(foo = 'foo', bar = 'bar') COMPUTE STATISTICS NOSCAN" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Styling nit, can we move this at the end of this identity chain since it breaks into multiple lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Thanks for the contrivution @zashroof, we'll take this to the finish line. |
Thanks for the prompt review, sorry didn't get a chance to respond to comments yesterday. I was planning on supporting the rest of the dialects. I will try to send a follow up PR if you don't mind. |
Sounds good, and no worries 👍 |
FTR - extending the parsing to cover all dialects is carried forward in #4591 |
Add specific parsing for
ANALYZE
statment instead of fallback to parse as Command.Analyze
statement reference