Skip to content

Commit 13e44f7

Browse files
ueshinHyukjinKwon
authored andcommitted
[SPARK-55988][PS][TESTS] Compare categorical index codes by values in tests
### What changes were proposed in this pull request? This PR updates `pyspark.pandas.tests.indexes.test_category` to compare categorical index `codes` by values instead of wrapping pandas `codes` with `pd.Index(...)`. The test now compares `psidx.codes.to_numpy()` against `pidx.codes`. ### Why are the changes needed? `CategoricalIndex.codes` on the pandas side is an ndarray-like result. Wrapping it with `pd.Index(...)` makes the test depend on pandas index materialization details rather than the categorical code values themselves. Comparing the raw codes is a closer match to the pandas API shape and avoids depending on pandas index container behavior that differs in pandas 3. ### Does this PR introduce _any_ user-facing change? No. This PR only updates test expectations. ### How was this patch tested? ```bash ./python/run-tests.py --testnames pyspark.pandas.tests.indexes.test_category ``` This test passed locally in both pandas 3.0.0 and pandas 2.3.3 environments. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Codex GPT-5 Closes #54788 from ueshin/issues/SPARK-55988/category. Authored-by: Takuya Ueshin <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 47424a3 commit 13e44f7

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

python/pyspark/pandas/tests/indexes/test_category.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,15 @@ def test_categorical_index(self):
3030

3131
self.assert_eq(psidx, pidx)
3232
self.assert_eq(psidx.categories, pidx.categories)
33-
self.assert_eq(psidx.codes, pd.Index(pidx.codes))
33+
self.assert_eq(psidx.codes.to_numpy(), pidx.codes)
3434
self.assert_eq(psidx.ordered, pidx.ordered)
3535

3636
pidx = pd.Index([1, 2, 3], dtype="category")
3737
psidx = ps.Index([1, 2, 3], dtype="category")
3838

3939
self.assert_eq(psidx, pidx)
4040
self.assert_eq(psidx.categories, pidx.categories)
41-
self.assert_eq(psidx.codes, pd.Index(pidx.codes))
41+
self.assert_eq(psidx.codes.to_numpy(), pidx.codes)
4242
self.assert_eq(psidx.ordered, pidx.ordered)
4343

4444
pdf = pd.DataFrame(
@@ -55,15 +55,15 @@ def test_categorical_index(self):
5555

5656
self.assert_eq(psidx, pidx)
5757
self.assert_eq(psidx.categories, pidx.categories)
58-
self.assert_eq(psidx.codes, pd.Index(pidx.codes))
58+
self.assert_eq(psidx.codes.to_numpy(), pidx.codes)
5959
self.assert_eq(psidx.ordered, pidx.ordered)
6060

6161
pidx = pdf.set_index(["a", "b"]).index.get_level_values(0)
6262
psidx = psdf.set_index(["a", "b"]).index.get_level_values(0)
6363

6464
self.assert_eq(psidx, pidx)
6565
self.assert_eq(psidx.categories, pidx.categories)
66-
self.assert_eq(psidx.codes, pd.Index(pidx.codes))
66+
self.assert_eq(psidx.codes.to_numpy(), pidx.codes)
6767
self.assert_eq(psidx.ordered, pidx.ordered)
6868

6969
with self.assertRaisesRegex(TypeError, "Index.name must be a hashable type"):

0 commit comments

Comments
 (0)