SNOW-1360263: row_number window function raises a ValueError (index is not in list) #1490

samuelsongsr · 2024-05-02T14:12:32Z

What version of Python are you using?

Python 3.10.4 (main, May 26 2022, 13:33:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
What are the Snowpark Python and pandas versions in the environment?

pandas==2.2.1
snowflake-snowpark-python==1.15.0
What did you do?
I'm trying to test a row_number() window function and it's raising a ValueError on this line:
_plan.py

from datetime import date

from snowflake.snowpark import Row, Window, Session
from snowflake.snowpark.functions import row_number, col


# create some test data
data = [
    Row(id=1, row_date=date(2024, 1, 1), value=1),
    Row(id=2, row_date=date(2024, 1, 1), value=1),
    Row(id=1, row_date=date(2024, 1, 2), value=1),
    Row(id=1, row_date=date(2024, 1, 2), value=100),
    Row(id=2, row_date=date(2024, 1, 2), value=1)
]

# create a local testing session and dataframe
mock_session = Session.builder.config("local_testing", True).create()
test_data = mock_session.create_dataframe(data)

# partition over id and row_date and get the records with the largest values
window = Window.partition_by("id", "row_date").order_by(col("value").desc())
df = test_data.with_column("row_num", row_number().over(window)).where(col("row_num") == 1)

results = df.collect()

What did you expect to see?

The results should include the following 4 records:

Row(id=1, row_date=date(2024, 1, 1), value=1),
Row(id=2, row_date=date(2024, 1, 1), value=1),
Row(id=1, row_date=date(2024, 1, 2), value=100),
Row(id=2, row_date=date(2024, 1, 2), value=1)

Instead of the above results, a ValueError is raised stating (1 is not in list).

The text was updated successfully, but these errors were encountered:

tvdboom · 2024-09-13T09:49:05Z

Any update on this one? I am still encountering this issue on version 1.22.1

frederiksteiner · 2024-09-24T06:39:04Z

I encountered the same problem and played around with it.
I found a similar example and played a bit around with it.

This one does not work:

from datetime import date

from snowflake.snowpark import Row, Window, Session
from snowflake.snowpark.functions import row_number, col


# create some test data
data = [
    Row(id=1, row_date=date(2024, 1, 1), value=1),
    Row(id=2, row_date=date(2024, 1, 1), value=1),
    Row(id=3, row_date=date(2024, 1, 2), value=1),
    Row(id=1, row_date=date(2024, 1, 2), value=100),

]

# create a local testing session and dataframe
mock_session = Session.builder.config("local_testing", True).create()
test_data = mock_session.create_dataframe(data)

# partition over row_date and id and get the records with the largest values
window = Window.partition_by(["row_date", "id"]).order_by(col("value").desc())
df = test_data.with_column("row_num", row_number().over(window)).where(col("row_num") == 1)

results = df.collect()

But when swapping "row_date" and "id" in the partition_by, it actually works and gives no such error:

from datetime import date

from snowflake.snowpark import Row, Window, Session
from snowflake.snowpark.functions import row_number, col


# create some test data
data = [
    Row(id=1, row_date=date(2024, 1, 1), value=1),
    Row(id=2, row_date=date(2024, 1, 1), value=1),
    Row(id=3, row_date=date(2024, 1, 2), value=1),
    Row(id=1, row_date=date(2024, 1, 2), value=100),
]

# create a local testing session and dataframe
mock_session = Session.builder.config("local_testing", True).create()
test_data = mock_session.create_dataframe(data)

# partition over id and row_date and get the records with the largest values
window = Window.partition_by(["id", "row_date", ]).order_by(col("value").desc())
df = test_data.with_column("row_num", row_number().over(window)).where(col("row_num") == 1)

results = df.collect()

sfc-gh-jrose · 2024-11-05T17:05:30Z

This should be fixed in v1.24.0

samuelsongsr added bug Something isn't working local testing Local Testing issues/PRs needs triage Initial RCA is required labels May 2, 2024

github-actions bot changed the title ~~row_number window function raises a ValueError (index is not in list)~~ SNOW-1360263: row_number window function raises a ValueError (index is not in list) May 2, 2024

sfc-gh-ashahi self-assigned this May 2, 2024

sfc-gh-ashahi added the status-triage_needed This is a new issue, and initial triage is needed label May 2, 2024

sfc-gh-jrose closed this as completed Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1360263: row_number window function raises a ValueError (index is not in list) #1490

SNOW-1360263: row_number window function raises a ValueError (index is not in list) #1490

samuelsongsr commented May 2, 2024

tvdboom commented Sep 13, 2024

frederiksteiner commented Sep 24, 2024

sfc-gh-jrose commented Nov 5, 2024

SNOW-1360263: row_number window function raises a ValueError (index is not in list) #1490

SNOW-1360263: row_number window function raises a ValueError (index is not in list) #1490

Comments

samuelsongsr commented May 2, 2024

tvdboom commented Sep 13, 2024

frederiksteiner commented Sep 24, 2024

sfc-gh-jrose commented Nov 5, 2024