Description
Describe the enhancement requested
This is the umbrella ticket for ongoing efforts to improve and revise the PyArrow Python User Guide and API reference documentation.
Many sections of the guide need refreshing, especially based on what users frequently search for and the kinds of issues commonly reported. We’ve already started prioritizing topics through research using Matomo web analytics and GitHub issues. In the future, insights from Kapa AI will also be incorporated. Comments on the priorities are welcome!
We’ll open sub-issues for specific tasks as we go. Everyone is welcome to contribute—whether you're experienced or just getting started, your help in making the PyArrow docs better is appreciated! ❤️
Suggested Focus Areas (in order of priority):
- Parquet module
- Dataset module
Table
,RecordBatch
,Schema
and data types- Getting Started
- Pandas integration
- IPC
Note: The User Guide is the main focus of this revision effort, but improving the API reference documentation is also important. Analytics show it receives a significant amount of traffic, so we'll consider enhancements there as well.
Existing Documentation Issues
- Datasets
- [Python][Docs] Improve the Python user guide on the CUDA integration (
pyarrow.cuda
) #41666 - [Python][Azure][Docs] Add documentation about AzureFilesystem #41496
- [Docs][Python] Add all tensor classes documentation #43352
Connected
- [Doc] Use sphinx-remove-toctrees to generated docstring pages from navigation (and reduce build time) #30021
- [Doc][Python] The use of IPython directive or doctest code blocks in the python user guide #28859
Component(s)
Documentation, Python