Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1753336: [Local testing] Merge and update fail when target table's index is not RangeIndex #2481

Closed
tvdboom opened this issue Oct 19, 2024 · 0 comments
Labels
bug Something isn't working needs triage Initial RCA is required

Comments

@tvdboom
Copy link

tvdboom commented Oct 19, 2024

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    Python 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]

  2. What operating system and processor architecture are you using?

    Windows-10-10.0.22631-SP0

  3. What are the component versions in the environment (pip freeze)?

    pandas==2.2.3
    snowflake-snowpark-python==1.23.0

  4. What did you do?

    Both table.merge and table.update fail with the same error because they both use this line target_index = target.loc[rows_to_update[ROW_ID]].index, which uses pandas .loc where rows_to_update[ROW_ID] correspond to indices of a dataframe with default RangeIndex. But when that's not the case, it fails. Solutions are: either change all the .loc for .iloc, reset the indices of the target tables before merging/updating, or reset the index of every table read or reset the index of every table written (since I think the index is always assumed to be RangeIndex).

Example

import pandas as pd
from snowflake.snowpark import Session

mock_session = Session.builder.config("local_testing", True).create()
test_data = mock_session.create_dataframe(pd.DataFrame({"A": [1, 2, 3, 4, 5]}))
test_data2 = mock_session.create_dataframe(pd.DataFrame({"A": [3, 4]}))

table = test_data.where(col("A") > 2).cache_result()

table.update(
    assignments={"A": lit(9)},
    condition=table["A"] == test_data2["A"],
    source=test_data2,
)
table.show()
  1. What did you expect to see?

    No error. Should return

-------
|"A"  |
-------
|9    |
|9    |
|5    |
-------

but got:

    Traceback (most recent call last):
  File "C:\repos\hippolib\venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-2e397755fd61>", line 1, in <module>
    runfile('C:\\repos\\hippolib\\test.py', wdir='C:\\repos\\hippolib')
  File "C:\Program Files\JetBrains\PyCharm 2023.3.5\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\JetBrains\PyCharm 2023.3.5\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:\repos\hippolib\test.py", line 19, in <module>
    table.update(
  File "C:\repos\hippolib\venv\Lib\site-packages\snowflake\snowpark\table.py", line 477, in update
    result = new_df._internal_collect_with_tag(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv\Lib\site-packages\snowflake\snowpark\_internal\telemetry.py", line 167, in wrap
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv\Lib\site-packages\snowflake\snowpark\dataframe.py", line 651, in _internal_collect_with_tag_no_telemetry
    return self._session._conn.execute(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv\Lib\site-packages\snowflake\snowpark\mock\_connection.py", line 603, in execute
    res = execute_mock_plan(plan, plan.expr_to_alias)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv\Lib\site-packages\snowflake\snowpark\mock\_plan.py", line 1229, in execute_mock_plan
    target_index = target.loc[rows_to_update[ROW_ID]].index
                   ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv\Lib\site-packages\pandas\core\indexing.py", line 1191, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv\Lib\site-packages\pandas\core\indexing.py", line 1420, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv\Lib\site-packages\pandas\core\indexing.py", line 1360, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv\Lib\site-packages\pandas\core\indexing.py", line 1558, in _get_listlike_indexer
    keyarr, indexer = ax._get_indexer_strict(key, axis_name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\hippolib\venv\Lib\site-packages\pandas\core\indexes\base.py", line 6200, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "C:\repos\hippolib\venv\Lib\site-packages\pandas\core\indexes\base.py", line 6249, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index([0, 1], dtype='int64')] are in the [index]"
@tvdboom tvdboom added bug Something isn't working needs triage Initial RCA is required labels Oct 19, 2024
@github-actions github-actions bot changed the title [Local testing] Merge and update fail when target table's index is not RangeIndex SNOW-1753336: [Local testing] Merge and update fail when target table's index is not RangeIndex Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Initial RCA is required
Projects
None yet
Development

No branches or pull requests

2 participants