Parquet batch configuration and documentation for spec updates#2281
Merged
tomkralidis merged 3 commits intogeopython:masterfrom Mar 10, 2026
Merged
Parquet batch configuration and documentation for spec updates#2281tomkralidis merged 3 commits intogeopython:masterfrom
tomkralidis merged 3 commits intogeopython:masterfrom
Conversation
C-Loftus
commented
Mar 9, 2026
C-Loftus
commented
Mar 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
When using the parquet provider, previously there was no config option for
batch_sizeorbatch_readahead. pyarrow defaults to batch sizes of over 100k with a default readahead of 16 batches. This is fine for analytics but in a server use case it is generally far too large and when using parquet over s3, adds a lot of extra data transfer.I changed the provider to allow these options to have smaller defaults and to allow them to be configurable in the provider definition.
I added comments for documentation for this in the code and in the rst docs. I brought some extra info for the docs from #2271 as well while I was at it.
Related Issue / discussion
Sorta #2271 / #2150
Additional information
Dependency policy (RFC2)
Updates to public demo
Contributions and licensing
(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)