Skip to content

GH-33459: [C++][Python] Support step >= 1 in list_slice kernel#48769

Merged
AlenkaF merged 1 commit intoapache:mainfrom
HyukjinKwon:GH-33459
Feb 5, 2026
Merged

GH-33459: [C++][Python] Support step >= 1 in list_slice kernel#48769
AlenkaF merged 1 commit intoapache:mainfrom
HyukjinKwon:GH-33459

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jan 7, 2026

Rationale for this change

Closes ARROW-18281, which has been open since 2022. The list_slice kernel currently rejects start == stop, but should return empty lists instead (following Python slicing semantics).

The implementation already handles this case correctly. When ARROW-18282 added step support, bit_util::CeilDiv(stop - start, step) naturally returns 0 for start == stop, producing empty lists. The only issue was the validation check (start >= stop) that prevented this from working.

What changes are included in this PR?

  • Changed validation from start >= stop to start > stop
  • Updated error message
  • Added test cases

Are these changes tested?

Yes, tests were added.

Are there any user-facing changes?

Yes.

import pyarrow.compute as pc
pc.list_slice([[1,2,3]], 0, 0)

Before:

pyarrow.lib.ArrowInvalid: `start`(0) should be greater than 0 and smaller than `stop`(0)

After:

<pyarrow.lib.ListArray object at 0x1a01b8b20>
[
  []
]

@github-actions github-actions bot added Component: C++ Component: Python awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 7, 2026
@HyukjinKwon HyukjinKwon force-pushed the GH-33459 branch 3 times, most recently from 90ae6a2 to 75fff9a Compare January 8, 2026 01:12
@HyukjinKwon
Copy link
Member Author

Gentle ping @AlenkaF . Do you mind taking a quick look whenever you find some time? I thought this is quite straightforward.

Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this leftover issue!
The change looks good to me. Will ask another C++ dev for review before merging (cc @rok maybe?, the change is small).

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Feb 4, 2026
Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice to do, just some minor suggestions.

Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good now!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Feb 4, 2026
@AlenkaF
Copy link
Member

AlenkaF commented Feb 4, 2026

@github-actions crossbow submit -g python

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

Revision: e69bb55

Submitted crossbow builds: ursacomputing/crossbow @ actions-74c64ef881

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-1.3.4-numpy-1.21.2 GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.13-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.13-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.14 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-cuda-python-ubuntu-24.04-cuda-13.0.2 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-42-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@rok
Copy link
Member

rok commented Feb 5, 2026

@HyukjinKwon can you rebase please? That should fix at least some of the build issues.

@HyukjinKwon
Copy link
Member Author

Sure, will take a look today!

@HyukjinKwon
Copy link
Member Author

I believe they are not related to this PR. They are mostly failing as:

=================================== FAILURES ===================================
_____________ TestConvertMetadata.test_column_index_names_with_tz ______________

self = <pyarrow.tests.test_pandas.TestConvertMetadata object at 0x7f0615b62360>

    def test_column_index_names_with_tz(self):
        # ARROW-13756
        # Bug if index is timezone aware DataTimeIndex
    
        df = pd.DataFrame(
            np.random.randn(5, 3),
            columns=pd.date_range("2021-01-01", periods=3, freq="50D", tz="CET")
        )
>       _check_pandas_roundtrip(df, preserve_index=True)

opt/conda/envs/arrow/lib/python3.11/site-packages/pyarrow/tests/test_pandas.py:223: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
opt/conda/envs/arrow/lib/python3.11/site-packages/pyarrow/tests/test_pandas.py:124: in _check_pandas_roundtrip
    tm.assert_frame_equal(result, expected, check_dtype=check_dtype,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

left = DatetimeIndex(['2021-01-01 00:00:00+01:00', '2021-02-20 00:00:00+01:00',
               '2021-04-11 00:00:00+02:00'],
              dtype='datetime64[ns, CET]', freq=None)
right = DatetimeIndex(['2021-01-01 00:00:00+01:00', '2021-02-20 00:00:00+01:00',
               '2021-04-11 00:00:00+02:00'],
              dtype='datetime64[us, CET]', freq='50D')
obj = 'DataFrame.columns'

    def _check_types(left, right, obj: str = "Index") -> None:
        if not exact:
            return
    
        assert_class_equal(left, right, exact=exact, obj=obj)
        assert_attr_equal("inferred_type", left, right, obj=obj)
    
        # Skip exact dtype checking when `check_categorical` is False
        if isinstance(left.dtype, CategoricalDtype) and isinstance(
            right.dtype, CategoricalDtype
        ):
            if check_categorical:
                assert_attr_equal("dtype", left, right, obj=obj)
                assert_index_equal(left.categories, right.categories, exact=exact)
            return
    
>       assert_attr_equal("dtype", left, right, obj=obj)
E       AssertionError: DataFrame.columns are different
E       
E       Attribute "dtype" are different
E       [left]:  datetime64[ns, CET]
E       [right]: datetime64[us, CET]

opt/conda/envs/arrow/lib/python3.11/site-packages/pandas/_testing/asserters.py:264: AssertionError
______________ TestConvertMetadata.test_mismatch_metadata_schema _______________

self = <pyarrow.tests.test_pandas.TestConvertMetadata object at 0x7f06160d3670>

    def test_mismatch_metadata_schema(self):
        # ARROW-10511
        # It is possible that the metadata and actual schema is not fully
        # matching (eg no timezone information for tz-aware column)
        # -> to_pandas() conversion should not fail on that
        df = pd.DataFrame({"datetime": pd.date_range("2020-01-01", periods=3)})
    
        # OPTION 1: casting after conversion
        table = pa.Table.from_pandas(df)
        # cast the "datetime" column to be tz-aware
        new_col = table["datetime"].cast(pa.timestamp('ns', tz="UTC"))
        new_table1 = table.set_column(
            0, pa.field("datetime", new_col.type), new_col
        )
    
        # OPTION 2: specify schema during conversion
        schema = pa.schema([("datetime", pa.timestamp('ns', tz="UTC"))])
        new_table2 = pa.Table.from_pandas(df, schema=schema)
    
        expected = df.copy()
        expected["datetime"] = expected["datetime"].dt.tz_localize("UTC")
    
        for new_table in [new_table1, new_table2]:
            # ensure the new table still has the pandas metadata
            assert new_table.schema.pandas_metadata is not None
            # convert to pandas
            result = new_table.to_pandas()
>           tm.assert_frame_equal(result, expected)
E           AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="datetime") are different
E           
E           Attribute "dtype" are different
E           [left]:  datetime64[ns, UTC]
E           [right]: datetime64[us, UTC]

opt/conda/envs/arrow/lib/python3.11/site-packages/pyarrow/tests/test_pandas.py:734: AssertionError

Let me file an issue and take a separate look.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Feb 5, 2026

I think it relates to #48314 .. taking a look more ..

@HyukjinKwon
Copy link
Member Author

ah it's simply my PR has to be rebased!

@HyukjinKwon
Copy link
Member Author

@github-actions crossbow submit -g python

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Revision: af161ab

Submitted crossbow builds: ursacomputing/crossbow @ actions-84c52d3411

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-1.3.4-numpy-1.21.2 GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.13-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.13-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.14 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-debian-13-python-3-amd64 GitHub Actions
test-debian-13-python-3-i386 GitHub Actions
test-fedora-42-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@AlenkaF AlenkaF merged commit 49423f8 into apache:main Feb 5, 2026
52 checks passed
@AlenkaF AlenkaF removed the awaiting merge Awaiting merge label Feb 5, 2026
@AlenkaF
Copy link
Member

AlenkaF commented Feb 5, 2026

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants