Feat: parse analyze compute statistics #4547

zashroof · 2024-12-25T00:59:40Z

Add specific parsing for ANALYZE statment instead of fallback to parse as Command.

supported	dialect	`Analyze` statement reference
✔️	databricks	https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-analyze-table.html
	doris	https://doris.apache.org/docs/2.0/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/ANALYZE
	drill	https://drill.apache.org/docs/analyze-table-compute-statistics, https://drill.apache.org/docs/analyze-table-refresh-metadata/
	duckdb	https://duckdb.org/docs/sql/statements/analyze
	mysql	https://dev.mysql.com/doc/refman/8.4/en/analyze-table.html
	oracle	https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/ANALYZE.html
	postgres	https://www.postgresql.org/docs/current/sql-analyze.html
	presto	https://prestodb.io/docs/current/sql/analyze.html
	reshift	https://docs.aws.amazon.com/redshift/latest/dg/r_ANALYZE.html
✔️	spark	https://spark.apache.org/docs/latest/sql-ref-syntax-aux-analyze-table.html
	sqlite	https://www.sqlite.org/lang_analyze.html
	starrocks	https://docs.starrocks.io/docs/sql-reference/sql-statements/cbo_stats/ANALYZE_TABLE/
	trino	https://trino.io/docs/current/sql/analyze.html

georgesittas · 2024-12-26T17:14:12Z

Hi @zashroof, thanks for the PR. Can you please share any related documentation? What dialects does this cover?

zashroof · 2024-12-26T17:37:21Z

Hi @zashroof, thanks for the PR. Can you please share any related documentation? What dialects does this cover?

Sorry about that, updated the PR description with links.
https://spark.apache.org/docs/3.5.1/sql-ref-syntax-aux-analyze-table.html
https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-analyze-table.html

georgesittas · 2024-12-30T11:29:58Z

FYI, the team's off for holidays, we'll take a look in a week or so. Thanks for providing those links. :)

zashroof · 2024-12-30T15:01:31Z

FYI, the team's off for holidays, we'll take a look in a week or so. Thanks for providing those links. :)

No worries, I didn't expect a review during the holidays :) Happy Holidays!

VaggelisD

Hey @zashroof, thank you for the PR!

To my knowledge, there are other dialects that support the ANALYZE statement e.g Postgres; If we implement parsing for it, we should make sure that:

We add support for ANALYZE across all dialects
If (1) has a large scope, we add exp.Command fallbacks at any point that the Spark/Databricks syntax is not met.

Otherwise, we risk introducing regressions such as incomplete parsing/generation or errors for other dialects. Check out how self._parse_as_command(...) is used for other statements as well.

sqlglot/parser.py

sqlglot/tokens.py

sqlglot/parser.py

sqlglot/generator.py

tests/fixtures/identity.sql

zashroof · 2025-01-07T04:15:17Z

Hey @zashroof, thank you for the PR!

To my knowledge, there are other dialects that support the ANALYZE statement e.g Postgres; If we implement parsing for it, we should make sure that:

We add support for ANALYZE across all dialects

If (1) has a large scope, we add exp.Command fallbacks at any point that the Spark/Databricks syntax is not met.

Otherwise, we risk introducing regressions such as incomplete parsing/generation or errors for other dialects. Check out how self._parse_as_command(...) is used for other statements as well.

Skimming through currently supported dialects docs to see if they define an analyze statement (There is no ANALYZE statement in the SQL standard).

dialect	`Analyze` statement reference
databricks	https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-analyze-table.html
doris	https://doris.apache.org/docs/2.0/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/ANALYZE
drill	https://drill.apache.org/docs/analyze-table-compute-statistics, https://drill.apache.org/docs/analyze-table-refresh-metadata/
duckdb	https://duckdb.org/docs/sql/statements/analyze
mysql	https://dev.mysql.com/doc/refman/8.4/en/analyze-table.html
oracle	https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/ANALYZE.html
postgres	https://www.postgresql.org/docs/current/sql-analyze.html
presto	https://prestodb.io/docs/current/sql/analyze.html
reshift	https://docs.aws.amazon.com/redshift/latest/dg/r_ANALYZE.html
spark	https://spark.apache.org/docs/latest/sql-ref-syntax-aux-analyze-table.html
sqlite	https://www.sqlite.org/lang_analyze.html
starrocks	https://docs.starrocks.io/docs/sql-reference/sql-statements/cbo_stats/ANALYZE_TABLE/
trino	https://trino.io/docs/current/sql/analyze.html

Well tbh this seems a little more than I anticipated. Multiple are already covered by spark impl but let me see if I can add test cases for a few more. In the mean time I should add fallback to parse as command.

georgesittas

Left a few comments as well, should be good to go once we have coverage for the remaining dialects, as Vaggelis said.

sqlglot/parser.py

VaggelisD

A few minor comments, looks much cleaner!

VaggelisD · 2025-01-08T12:56:49Z

sqlglot/parser.py

+        kind = None
+        this: t.Optional[exp.Expression] = None
+        partition = None
+


Nit, could we do kind = self._curr and self._curr.text.upper() here i.e before the branches? I think that would remove the hardcoded values in the if/elif

re-wrote this part in #4591

VaggelisD · 2025-01-08T12:58:35Z

sqlglot/tokens.py

@@ -410,6 +410,8 @@ class TokenType(AutoName):
    OPTION = auto()
    SINK = auto()
    SOURCE = auto()
+    ANALYZE = auto()
+    COMPUTE_STATISTICS = auto()


We can now remove the COMPUTE STATISTICS token since it was removed from STATEMENT_PARSERS, right?

It can be consumed by the parser through self._match_text_seq("COMPUTE", "STATISTICS")

VaggelisD · 2025-01-08T13:00:54Z

tests/test_parser.py

+        ast = parse_one("ANALYZE TABLE tbl COMPUTE STATISTICS FOR ALL COLUMNS")
+        self.assertIsInstance(ast, exp.Analyze)


Since we're not passing in a specific dialect to parse_one, afaict we can merge each of these 2 lines as:

self.validate_identity(...).assert_is(exp.Command)

That's a great idea, I am removing this test here and adding assert_is(exp.Analyze) to all dialect specefic test.

I then removed it when parse_analyze only returns exp.Analyze. Changes in #4591

VaggelisD · 2025-01-08T13:02:52Z

sqlglot/expressions.py

+
+class ComputeStatistics(Expression):
+    arg_types = {
+        "this": False,


We'll always have this here according to _parse_compute_statistics, we can make this True

Done in #4591

VaggelisD · 2025-01-08T13:06:12Z

tests/dialects/test_spark.py

+        self.validate_identity(
+            "ANALYZE TABLE ctlg.db.tbl PARTITION(foo = 'foo', bar = 'bar') COMPUTE STATISTICS NOSCAN"
+        )


Styling nit, can we move this at the end of this identity chain since it breaks into multiple lines

georgesittas · 2025-01-09T12:06:37Z

Thanks for the contrivution @zashroof, we'll take this to the finish line.

zashroof · 2025-01-09T14:38:31Z

Thanks for the contrivution @zashroof, we'll take this to the finish line.

Thanks for the prompt review, sorry didn't get a chance to respond to comments yesterday. I was planning on supporting the rest of the dialects. I will try to send a follow up PR if you don't mind.

georgesittas · 2025-01-09T14:55:18Z

Sounds good, and no worries 👍

zashroof · 2025-01-12T07:56:25Z

FTR - extending the parsing to cover all dialects is carried forward in #4591

feat: define analyze as standalone statment rather than command

0f00d29

shkelzeen approved these changes Dec 27, 2024

View reviewed changes

VaggelisD reviewed Jan 6, 2025

View reviewed changes

zashroof added 3 commits January 6, 2025 19:14

Python style review comments

a2e9c01

Move spark syntax to test_spark.py

b00da28

nit: auto formatting

1ca874d

Fallback to parse as Command when parsin unsupported analyze syntax

33af89f

georgesittas reviewed Jan 7, 2025

View reviewed changes

zashroof and others added 3 commits January 7, 2025 13:31

Style review comments

5357b4c

Remove COMPUTE STATISTICS from statement parsers

b2ee218

Merge branch 'main' into feat-add-analyze

28bb75a

VaggelisD reviewed Jan 8, 2025

View reviewed changes

Merge branch 'main' into feat-add-analyze

280dd54

georgesittas merged commit c75016a into tobymao:main Jan 9, 2025
8 checks passed

zashroof mentioned this pull request Jan 10, 2025

feat: Extend ANALYZE common syntax to cover multiple dialects #4591

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: parse analyze compute statistics #4547

Feat: parse analyze compute statistics #4547

zashroof commented Dec 25, 2024 •

edited

Loading

georgesittas commented Dec 26, 2024

zashroof commented Dec 26, 2024

georgesittas commented Dec 30, 2024

zashroof commented Dec 30, 2024

VaggelisD left a comment

zashroof commented Jan 7, 2025

georgesittas left a comment

VaggelisD left a comment

VaggelisD Jan 8, 2025

zashroof Jan 12, 2025

VaggelisD Jan 8, 2025

zashroof Jan 12, 2025

VaggelisD Jan 8, 2025

zashroof Jan 9, 2025

zashroof Jan 12, 2025

VaggelisD Jan 8, 2025

zashroof Jan 12, 2025

VaggelisD Jan 8, 2025

zashroof Jan 9, 2025

georgesittas commented Jan 9, 2025

zashroof commented Jan 9, 2025

georgesittas commented Jan 9, 2025

zashroof commented Jan 12, 2025

		ast = parse_one("ANALYZE TABLE tbl COMPUTE STATISTICS FOR ALL COLUMNS")
		self.assertIsInstance(ast, exp.Analyze)

Feat: parse analyze compute statistics #4547

Feat: parse analyze compute statistics #4547

Conversation

zashroof commented Dec 25, 2024 • edited Loading

georgesittas commented Dec 26, 2024

zashroof commented Dec 26, 2024

georgesittas commented Dec 30, 2024

zashroof commented Dec 30, 2024

VaggelisD left a comment

Choose a reason for hiding this comment

zashroof commented Jan 7, 2025

georgesittas left a comment

Choose a reason for hiding this comment

VaggelisD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgesittas commented Jan 9, 2025

zashroof commented Jan 9, 2025

georgesittas commented Jan 9, 2025

zashroof commented Jan 12, 2025

zashroof commented Dec 25, 2024 •

edited

Loading