Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Pandas FutureWarning about integer indexing #1968

Merged
merged 4 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@
- Fixed a bug that Window Functions LEAD and LAG do not handle option `ignore_nulls` properly.
- Fixed a bug where values were not populated into the result DataFrame during the insertion of table merge operation.

#### Improvements
- Fix pandas FutureWarning about integer indexing.

### Snowpark pandas API Updates
#### New Features
- Added support for `DataFrame.backfill`, `DataFrame.bfill`, `Series.backfill`, and `Series.bfill`.
Expand Down
8 changes: 4 additions & 4 deletions src/snowflake/snowpark/mock/_pandas_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,9 @@ def _extract_schema_and_data_from_pandas_df(
# pandas PANDAS_INTEGER_TYPES (e.g. INT8Dtye) will also store data in the format of float64
# here we use the col dtype info to convert data
plain_data[row_idx][col_idx] = (
int(data.iloc[row_idx][col_idx])
if isinstance(data.dtypes[col_idx], PANDAS_INTEGER_TYPES)
else float(str(data.iloc[row_idx][col_idx]))
int(data.iloc[row_idx, col_idx])
if isinstance(data.dtypes.iloc[col_idx], PANDAS_INTEGER_TYPES)
else float(str(data.iloc[row_idx, col_idx]))
)
elif isinstance(plain_data[row_idx][col_idx], numpy.float32):
# convert str first and then to float to avoid precision drift as its stored in float32 format
Expand All @@ -93,7 +93,7 @@ def _extract_schema_and_data_from_pandas_df(
):
plain_data[row_idx][col_idx] = int(plain_data[row_idx][col_idx])
elif isinstance(plain_data[row_idx][col_idx], pd.Timestamp):
if isinstance(data.dtypes[col_idx], pd.DatetimeTZDtype):
if isinstance(data.dtypes.iloc[col_idx], pd.DatetimeTZDtype):
# this is to align with the current snowflake behavior that it
# apply the tz diff to time and then removes the tz information during ingestion
plain_data[row_idx][col_idx] = (
Expand Down
22 changes: 22 additions & 0 deletions tests/mock/test_pandas_util.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#
# Copyright (c) 2012-2024 Snowflake Computing Inc. All rights reserved.
#
import pandas as pd
import pytest

from snowflake.snowpark import DataFrame


@pytest.mark.filterwarnings("error::FutureWarning")
def test_extract_schema_from_df_without_future_warning(session):
"""
Make sure that while converting a Pandas dataframe to a Snowflake dataframe no
FutureWarnings are thrown, which hint at upcoming incompatibilities.
"""
pandas_df = pd.DataFrame({"A": [1.0]}, dtype=float)
df = session.create_dataframe(pandas_df)
assert isinstance(df, DataFrame)

pandas_df = pd.DataFrame({"Timestamp": [pd.to_datetime(1490195805, unit="s")]})
df = session.create_dataframe(pandas_df)
assert isinstance(df, DataFrame)
Loading