BUG: Assigning pd.NA to StringDtype column causes data corruption and pyarrow error by kjmin622 · Pull Request #64339 · pandas-dev/pandas

kjmin622 · 2026-02-27T15:35:40Z

closes BUG: Assigning pd.NA to StringDtype column causes "Unknown error: Wrapping .. failed" after pd.concat with PyArrow #64320 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.
I have reviewed and followed all the contribution guidelines
If I used AI to develop this pull request, I prompted it to follow AGENTS.md.

…pping .. failed' after pd.concat with PyArrow

…e64320

kjmin622 · 2026-02-27T15:40:03Z

pandas/core/arrays/arrow/array.py

+        # TODO: Remove this part when pa.if_else is fixed (GH#64320)
+        if isinstance(left, pa.ChunkedArray) and (
+            pa.types.is_string(left.type) or pa.types.is_large_string(left.type)
+        ):
+            left = left.combine_chunks()
+
+        if isinstance(right, pa.ChunkedArray) and (
+            pa.types.is_string(right.type) or pa.types.is_large_string(right.type)
+        ):
+            right = right.combine_chunks()


PyArrow's pc.if_else misbehaves with chunked string arrays, causing PyArrow errors and data corruption; this fix call combine_chunks() on string/large_string ChunkedArrays in _if_else before invoking pc.if_else.

Can you check if combine_chunks() might fail if the string chunks together would go over the 2GB limit of what string (in contrast to large_string) can represent in a single array?

So potentially we have to put this in a try-except ..
Or, we might also want to check if one of the chunks has a non-zero offset (because the bug only happens in that case, and not for chunked string arrays in general)

@jorisvandenbossche Thanks for your review! I've applied the review.

Thanks for the update, that looks good now!

…e64320

doc/source/whatsnew/v3.0.2.rst

Co-authored-by: Joris Van den Bossche <[email protected]>

pandas/core/arrays/arrow/array.py

jorisvandenbossche · 2026-03-11T10:13:04Z

@meeseeksdev backport to 3.0.x

…umn causes data corruption and pyarrow error

jorisvandenbossche · 2026-03-11T10:24:46Z

Thanks @kjmin622 !

…pe column causes data corruption and pyarrow error) (#64526) Co-authored-by: Jeongmin Gil <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]>

kjmin622 added 3 commits February 27, 2026 04:07

BUG: Assigning pd.NA to StringDtype column causes 'Unknown error: Wra…

1be5228

…pping .. failed' after pd.concat with PyArrow

Merge branch 'main' of https://github.com/pandas-dev/pandas into issu…

fc64091

…e64320

add test

7c89b6a

kjmin622 commented Feb 27, 2026

View reviewed changes

kjmin622 marked this pull request as draft February 27, 2026 15:40

kjmin622 marked this pull request as ready for review February 27, 2026 17:25

kjmin622 added 2 commits February 28, 2026 02:59

Merge branch 'main' of https://github.com/pandas-dev/pandas into issu…

aa04253

…e64320

modify test error

a62256f

kjmin622 force-pushed the issue64320 branch from b51d594 to a62256f Compare February 27, 2026 17:59

jorisvandenbossche reviewed Mar 6, 2026

View reviewed changes

doc/source/whatsnew/v3.0.2.rst Outdated Show resolved Hide resolved

kjmin622 and others added 3 commits March 8, 2026 12:37

Update doc/source/whatsnew/v3.0.2.rst

17d4fee

Co-authored-by: Joris Van den Bossche <[email protected]>

apply review

a0b994f

Merge branch 'main' into issue64320

229f265

jorisvandenbossche changed the title ~~BUG: Assigning pd.NA to StringDtype column causes "Unknown error: Wrapping .. failed" after pd.concat with PyArrow~~ BUG: Assigning pd.NA to StringDtype column causes data corruption and pyarrow error Mar 11, 2026

jorisvandenbossche approved these changes Mar 11, 2026

View reviewed changes

pandas/core/arrays/arrow/array.py Outdated Show resolved Hide resolved

Update pandas/core/arrays/arrow/array.py

8a9127e

jorisvandenbossche merged commit 84c7b19 into pandas-dev:main Mar 11, 2026
42 of 45 checks passed

jorisvandenbossche added this to the 3.0.2 milestone Mar 11, 2026

jorisvandenbossche added Bug Strings String extension data type and string data Arrow pyarrow functionality labels Mar 11, 2026

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Mar 11, 2026

Backport PR pandas-dev#64339: BUG: Assigning pd.NA to StringDtype col…

9935ea9

…umn causes data corruption and pyarrow error

meeseeksmachine mentioned this pull request Mar 11, 2026

Backport PR #64339 on branch 3.0.x (BUG: Assigning pd.NA to StringDtype column causes data corruption and pyarrow error) #64526

Merged

This was referenced Mar 11, 2026

CI: fix linting #64527

Merged

BUG: Assigning pd.NA to ArrowStringArray after 3-way slice concat silently corrupts other rows #64511

Closed

jorisvandenbossche mentioned this pull request Mar 11, 2026

PERF: avoid if_else operation for simple full slice in ArrowExtensionArray.__setitem__ #64529

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: Assigning pd.NA to StringDtype column causes data corruption and pyarrow error#64339

BUG: Assigning pd.NA to StringDtype column causes data corruption and pyarrow error#64339
jorisvandenbossche merged 9 commits intopandas-dev:mainfrom
kjmin622:issue64320

kjmin622 commented Feb 27, 2026

Uh oh!

kjmin622 Feb 27, 2026

Uh oh!

jorisvandenbossche Mar 6, 2026

Uh oh!

kjmin622 Mar 8, 2026

Uh oh!

jorisvandenbossche Mar 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Mar 11, 2026

Uh oh!

jorisvandenbossche commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kjmin622 commented Feb 27, 2026

Uh oh!

kjmin622 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

kjmin622 Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Mar 11, 2026

Uh oh!

jorisvandenbossche commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants