Skip to content

Commit 43feb4e

Browse files
authored
[SNOW-1739096] Disallow detecting none select statement as repeated subquery (#2467)
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1739096 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Repeated subquery elimination is only designed to eliminate subquery that is selectStatement. Since we do not distinguish query that is selectStatement and none selectStatement, we can incorrectly detect non selectstatement as candidate. for example, when we have ``` df1 = session.sql("show tables") df2 = session.sql("show tables") df_result = df1.union(df2) ``` The show table query its-self can be incorrectly detected as a common query and produced wrong query like ``` with_cte_xxx (show tables) xxxx ``` In this pr, we exclude none-selectstatment from the candidate node encoding to avoid count those node as candidate for repeated subquery elimination
1 parent 7dc2670 commit 43feb4e

File tree

2 files changed

+26
-0
lines changed

2 files changed

+26
-0
lines changed

src/snowflake/snowpark/_internal/analyzer/cte_utils.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
)
1515
from snowflake.snowpark._internal.utils import (
1616
TempObjectType,
17+
is_sql_select_statement,
1718
random_name_for_temp_object,
1819
)
1920

@@ -186,6 +187,12 @@ def encoded_query_id(node) -> Optional[str]:
186187
query = node.sql_query
187188
query_params = node.query_params
188189

190+
if not is_sql_select_statement(query):
191+
# common subquery elimination only supports eliminating
192+
# subquery that is select statement. Skip encoding the query
193+
# to avoid being detected as a common subquery.
194+
return None
195+
189196
string = f"{query}#{query_params}" if query_params else query
190197
try:
191198
return hashlib.sha256(string.encode()).hexdigest()[:10]

tests/integ/test_cte.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -747,6 +747,25 @@ def test_sql(session, query):
747747
assert count_number_of_ctes(df_result.queries["queries"][-1]) == 1
748748

749749

750+
def test_sql_non_select(session):
751+
df1 = session.sql("show tables in schema limit 10")
752+
df2 = session.sql("show tables in schema limit 10")
753+
754+
df_result = df1.union(df2).select('"name"').filter(lit(True))
755+
756+
check_result(
757+
session,
758+
df_result,
759+
# since the two show tables are called in two different dataframe, we
760+
# won't be able to detect those as common subquery.
761+
expect_cte_optimized=False,
762+
query_count=3,
763+
describe_count=0,
764+
union_count=1,
765+
join_count=0,
766+
)
767+
768+
750769
@pytest.mark.parametrize(
751770
"action",
752771
[

0 commit comments

Comments
 (0)