Skip to content

Commit

Permalink
[SNOW-1739096] Disallow detecting none select statement as repeated s…
Browse files Browse the repository at this point in the history
…ubquery (#2467)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1739096

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Repeated subquery elimination is only designed to eliminate subquery
that is selectStatement. Since we do not distinguish query that is
selectStatement and none selectStatement, we can incorrectly detect non
selectstatement as candidate. for example, when we have
```
  df1 = session.sql("show tables")
   df2 = session.sql("show tables")

   df_result = df1.union(df2)
```
The show table query its-self can be incorrectly detected as a common
query and produced wrong query like
```
with_cte_xxx (show tables) xxxx
```

In this pr, we exclude none-selectstatment from the candidate node
encoding to avoid count those node as candidate for repeated subquery
elimination
  • Loading branch information
sfc-gh-yzou authored Oct 18, 2024
1 parent 7dc2670 commit 43feb4e
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 0 deletions.
7 changes: 7 additions & 0 deletions src/snowflake/snowpark/_internal/analyzer/cte_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
)
from snowflake.snowpark._internal.utils import (
TempObjectType,
is_sql_select_statement,
random_name_for_temp_object,
)

Expand Down Expand Up @@ -186,6 +187,12 @@ def encoded_query_id(node) -> Optional[str]:
query = node.sql_query
query_params = node.query_params

if not is_sql_select_statement(query):
# common subquery elimination only supports eliminating
# subquery that is select statement. Skip encoding the query
# to avoid being detected as a common subquery.
return None

string = f"{query}#{query_params}" if query_params else query
try:
return hashlib.sha256(string.encode()).hexdigest()[:10]
Expand Down
19 changes: 19 additions & 0 deletions tests/integ/test_cte.py
Original file line number Diff line number Diff line change
Expand Up @@ -747,6 +747,25 @@ def test_sql(session, query):
assert count_number_of_ctes(df_result.queries["queries"][-1]) == 1


def test_sql_non_select(session):
df1 = session.sql("show tables in schema limit 10")
df2 = session.sql("show tables in schema limit 10")

df_result = df1.union(df2).select('"name"').filter(lit(True))

check_result(
session,
df_result,
# since the two show tables are called in two different dataframe, we
# won't be able to detect those as common subquery.
expect_cte_optimized=False,
query_count=3,
describe_count=0,
union_count=1,
join_count=0,
)


@pytest.mark.parametrize(
"action",
[
Expand Down

0 comments on commit 43feb4e

Please sign in to comment.