-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Labels
bugSomething isn't workingSomething isn't workingneeds triageAwaiting prioritization by a maintainerAwaiting prioritization by a maintainerpythonRelated to Python PolarsRelated to Python Polars
Description
Checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
from polars.testing.asserts import assert_frame_equal
left = pl.LazyFrame({"a": ["a"]})
right = pl.LazyFrame({"a": "a", "b": range(20)})
joined = left.join(right, left_on="a", right_on="a", how="left", maintain_order="left")
overed = joined.select(pl.col("b").over("a", mapping_strategy="join"))
assert_frame_equal(overed.collect(), overed.collect(engine="streaming"))
Log output
join parallel: true
LEFT join dataframes finished
polars-stream: updating graph state
polars-stream: running equi-join in subgraph
polars-stream: running in-memory-source in subgraph
async thread count: 4
polars-stream: done running graph phase
polars-stream: updating graph state
polars-stream: running in-memory-source in subgraph
polars-stream: running in-memory-map in subgraph
polars-stream: running equi-join in subgraph
polars-stream: done running graph phase
polars-stream: updating graph state
polars-stream: running select in subgraph
polars-stream: running in-memory-map in subgraph
polars-stream: running in-memory-sink in subgraph
polars-stream: done running graph phase
polars-stream: updating graph state
Traceback (most recent call last):
File "/home/sson/.cache/pypoetry/virtualenvs/base-migration-hJsE43dC-py3.11/lib/python3.11/site-packages/polars/testing/asserts/frame.py", line 111, in assert_frame_equal
_assert_series_values_equal(
File "/home/sson/.cache/pypoetry/virtualenvs/base-migration-hJsE43dC-py3.11/lib/python3.11/site-packages/polars/testing/asserts/series.py", line 209, in _assert_series_values_equal
raise_assertion_error(
File "/home/sson/.cache/pypoetry/virtualenvs/base-migration-hJsE43dC-py3.11/lib/python3.11/site-packages/polars/testing/asserts/utils.py", line 12, in raise_assertion_error
raise AssertionError(msg) from cause
AssertionError: Series are different (exact value mismatch)
[left]: [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]
[right]: [[0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19]]
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/sson/python/bmf/gurd/polars_scan_sink_bug.py", line 9, in <module>
assert_frame_equal(overed.collect(), overed.collect(engine="streaming"))
File "/home/sson/.cache/pypoetry/virtualenvs/base-migration-hJsE43dC-py3.11/lib/python3.11/site-packages/polars/_utils/deprecation.py", line 128, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sson/.cache/pypoetry/virtualenvs/base-migration-hJsE43dC-py3.11/lib/python3.11/site-packages/polars/testing/asserts/frame.py", line 121, in assert_frame_equal
raise_assertion_error(
File "/home/sson/.cache/pypoetry/virtualenvs/base-migration-hJsE43dC-py3.11/lib/python3.11/site-packages/polars/testing/asserts/utils.py", line 12, in raise_assertion_error
raise AssertionError(msg) from cause
AssertionError: DataFrames are different (value mismatch for column 'b')
[left]: [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]
[right]: [[0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19], [0, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 3, 4, 5, 18, 19]]
Issue description
- I expect the
assert_frame_equal
to succeed, but it fails every once in a while because the order of the list in columnb
ofovered
is out of order. - This issue is similar to Problem with join operations followed by sink_csv on LazyFrame #15157 and Join on enums fails to match when sinking to parquet #20916, but because those don't mention their failures being intermittent I believe the behavior described here has a different cause.
- Even though
right
andjoined
are identical, the error seems to disappear when usingright
instead ofjoined
. There is also no error when omitting the.over
, so I believe this is a minimal recipe. - The query plan is the same when
.explain
ing with or withoutengine="streaming"
. - The issue persists when disabling all optimizations on the
.collect(engine="streaming")
call. - Comparing calls to
.profile(...)[0]
instead of.collect(...)
seems to resolve the issue.
Expected behavior
Expected behavior is the list order not differing between the two ways of collecting.
Installed versions
--------Version info---------
Polars: 1.31.0
Index type: UInt32
Platform: Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
Python: 3.11.9 (main, Apr 27 2024, 21:16:11) [GCC 13.2.0]
LTS CPU: False
----Optional dependencies----
Azure CLI <not installed>
adbc_driver_manager <not installed>
altair <not installed>
azure.identity /home/sson/.cache/pypoetry/virtualenvs/base-migration-hJsE43dC-py3.11/lib/python3.11/site-packages/azure/__init__.py:5: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
<not installed>
boto3 <not installed>
cloudpickle 3.1.1
connectorx <not installed>
deltalake <not installed>
fastexcel <not installed>
fsspec 2025.5.1
gevent <not installed>
google.auth <not installed>
great_tables <not installed>
matplotlib <not installed>
numpy 2.2.6
openpyxl 3.1.5
pandas 2.3.1
polars_cloud <not installed>
pyarrow 20.0.0
pydantic 2.11.7
pyiceberg <not installed>
sqlalchemy 2.0.41
torch <not installed>
xlsx2csv 0.8.4
xlsxwriter 3.2.5
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingneeds triageAwaiting prioritization by a maintainerAwaiting prioritization by a maintainerpythonRelated to Python PolarsRelated to Python Polars