Skip to content

Parquet batch configuration and documentation for spec updates#2281

Merged
tomkralidis merged 3 commits intogeopython:masterfrom
cgs-earth:parquetBatchConfiguration
Mar 10, 2026
Merged

Parquet batch configuration and documentation for spec updates#2281
tomkralidis merged 3 commits intogeopython:masterfrom
cgs-earth:parquetBatchConfiguration

Conversation

@C-Loftus
Copy link
Contributor

@C-Loftus C-Loftus commented Mar 9, 2026

Overview

When using the parquet provider, previously there was no config option for batch_size or batch_readahead. pyarrow defaults to batch sizes of over 100k with a default readahead of 16 batches. This is fine for analytics but in a server use case it is generally far too large and when using parquet over s3, adds a lot of extra data transfer.

I changed the provider to allow these options to have smaller defaults and to allow them to be configurable in the provider definition.

I added comments for documentation for this in the code and in the rst docs. I brought some extra info for the docs from #2271 as well while I was at it.

Related Issue / discussion

Sorta #2271 / #2150

Additional information

Dependency policy (RFC2)

  • I have ensured that this PR meets RFC2 requirements

Updates to public demo

Contributions and licensing

(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)

  • I'd like to contribute [feature X|bugfix Y|docs|something else] to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution
  • I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines

@tomkralidis tomkralidis self-requested a review March 9, 2026 16:34
@tomkralidis tomkralidis added this to the 0.24.0 milestone Mar 9, 2026
@tomkralidis tomkralidis merged commit 090ea9d into geopython:master Mar 10, 2026
4 checks passed
@C-Loftus C-Loftus deleted the parquetBatchConfiguration branch March 10, 2026 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants