[SPARK-55321][PYTHON][TESTS] Ignore null difference when comparing ps df/series by gaogaotiantian · Pull Request #54100 · apache/spark

gaogaotiantian · 2026-02-02T23:37:57Z

What changes were proposed in this pull request?

For all the numeric tests in data_type_ops, ignore the difference in null values (None vs np.nan vs pd.NA etc.).

Why are the changes needed?

pyspark.pandas always generate a different null value than pandas (pyspark only has one null value internally). However, pandas 3 makes it more strict for their internal testing utility so our tests start to fail. We can relax it on our side for now.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Locally with pandas 3, a lot of tests passed because of this change.

Was this patch authored or co-authored using generative AI tooling?

No.

github-actions · 2026-02-02T23:40:03Z

JIRA Issue Information

=== Test SPARK-55321 ===
Summary: Ignore null difference when we compare results from numeric operations
Assignee: None
Status: Open
Affected: ["4.2.0"]

This comment was automatically generated by GitHub Actions

ueshin · 2026-02-05T02:01:29Z

I'm afraid I feel this is too widely ignoring the nulls and I'm worrying we may miss what we can / should fix.
I'd go with #54146 first and see how many cases it fixes. WDYT? also cc @HyukjinKwon @zhengruifeng

zhengruifeng · 2026-02-05T02:56:44Z

@ueshin +1, I also feel we should try to fix as many as possible before we ignore this difference

HyukjinKwon · 2026-02-05T03:51:20Z

Yeah ..

gaogaotiantian · 2026-02-05T04:07:55Z

Sure we can do a more strict check for now. But we should also be fully aware that this comparison is what we do for now (pandas 2.x). We don't check null differences now - because that's the default behavior for pandas testing util. After we upgraded to pandas 3, the testing util changed so it shows all the null differences now. We are not fighting for the behavior difference between pandas 2 and pandas 3, we are trying to change the once-expected behavior for pyspark.pandas.

Basically even for pandas 2.x, we already generate None where pandas generate np.nan - but pandas testing util considers them the same. Now we still generate None where pandas generate np.nan, but pandas testing util thinks it's wrong.

gaogaotiantian added 2 commits February 2, 2026 15:13

Ignore null values when comparing values for data_type_ops

5f21b66

Improve comments

a7c16dd

github-actions bot added PYTHON PANDAS API ON SPARK labels Feb 2, 2026

zhengruifeng requested a review from ueshin February 3, 2026 05:51

gaogaotiantian mentioned this pull request Feb 5, 2026

[SPARK-55363][PS][TESTS] Make ops tests with "decimal_nan" columns ignore NaN vs. None #54146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55321][PYTHON][TESTS] Ignore null difference when comparing ps df/series #54100

[SPARK-55321][PYTHON][TESTS] Ignore null difference when comparing ps df/series #54100
gaogaotiantian wants to merge 2 commits intoapache:masterfrom
gaogaotiantian:fix-numeric-null

gaogaotiantian commented Feb 2, 2026

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

ueshin commented Feb 5, 2026

Uh oh!

zhengruifeng commented Feb 5, 2026

Uh oh!

HyukjinKwon commented Feb 5, 2026

Uh oh!

gaogaotiantian commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gaogaotiantian commented Feb 2, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Feb 2, 2026

JIRA Issue Information

Uh oh!

ueshin commented Feb 5, 2026

Uh oh!

zhengruifeng commented Feb 5, 2026

Uh oh!

HyukjinKwon commented Feb 5, 2026

Uh oh!

gaogaotiantian commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gaogaotiantian commented Feb 5, 2026 •

edited

Loading