Doc: Generate Arrow fallback Reference tables #63805

fangchenli · 2026-01-22T06:04:38Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.
If I used AI to develop this pull request, I prompted it to follow AGENTS.md.

Add a script to generate documentation tracking Arrow method fallbacks in pandas. Requested by Databricks' PySpark team to identify which operations trigger Arrow-to-NumPy conversions in PySpark's pandas UDFs.

@zhengruifeng @Yicong-Huang

Add documentation and tooling for Arrow method fallback behavior: - Add scripts/generate_arrow_fallback_table.py: Generator script that introspects pandas source to classify methods by their Arrow support (ARROW_NATIVE, CONDITIONAL, ELEMENTWISE, OBJECT_FALLBACK, VERSION_GATED, NOT_IMPLEMENTED). Includes --check flag for CI validation. - Add doc/source/user_guide/arrow_string_fallbacks.rst: Generated reference documenting which methods use native PyArrow compute vs falling back to Python/NumPy. Covers string, arithmetic, datetime, aggregation, array, list accessor, and struct accessor methods. - Add pre-commit hook (arrow-fallback-docs-sync) to ensure documentation stays in sync with source code changes. - Add comprehensive verification tests (204 tests) that validate classifications match actual runtime behavior. - Link new reference from pyarrow.rst user guide. - Update exclude pattern for private-import check to include scripts/tests. Co-Authored-By: Claude Opus 4.5 <[email protected]>

…reference

Replace the AST-based analysis with runtime observation: - Actually run all operations on all Arrow dtypes - Observe return types and errors - Instrument to_numpy and _apply_elementwise to detect fallbacks This approach is more accurate because it observes actual behavior rather than inferring from code analysis. Changes: - Rewrite scripts/generate_arrow_fallback_table.py using runtime tests - Update scripts/tests/test_generate_arrow_fallback_table.py for new API - Remove scripts/tests/test_arrow_fallback_verification.py (no longer needed) - Regenerate doc/source/user_guide/arrow_string_fallbacks.rst - Update pre-commit hook to use manual stage (requires pandas-dev env) Co-Authored-By: Claude Opus 4.5 <[email protected]>

zhengruifeng

Thank you so much for working on it, it is very helpful!

zhengruifeng · 2026-01-23T01:40:09Z

cc @HyukjinKwon

…reference

fangchenli and others added 5 commits January 11, 2026 15:46

Merge remote-tracking branch 'upstream/main' into doc/arrow-fallback-…

9256510

…reference

replace ast check script

4d7a947

fix page name, table export

8dee333

fangchenli added Docs Arrow pyarrow functionality labels Jan 22, 2026

zhengruifeng reviewed Jan 22, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into doc/arrow-fallback-…

f692805

…reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Doc: Generate Arrow fallback Reference tables #63805

Doc: Generate Arrow fallback Reference tables #63805

fangchenli commented Jan 22, 2026 •

edited

Loading

Uh oh!

zhengruifeng left a comment

Uh oh!

zhengruifeng commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Doc: Generate Arrow fallback Reference tables #63805

Are you sure you want to change the base?

Doc: Generate Arrow fallback Reference tables #63805

Conversation

fangchenli commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhengruifeng left a comment

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fangchenli commented Jan 22, 2026 •

edited

Loading