Skip to content

Best Practices guide for creation of good GeoParquet files (focused on distribution) #254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jul 14, 2025

Conversation

cholmes
Copy link
Member

@cholmes cholmes commented Jan 9, 2025

Attempt to pull together recommendations / best practices as discussed in #251.

More work needed, feedback / help is very welcome. Likely more to discuss to get the recommendations right, but wanted to put up something for people to react to.

@josh-id
Copy link

josh-id commented Jul 3, 2025

This is great and id love to see it merged at some point

@brawer
Copy link

brawer commented Jul 9, 2025

Curious, what’s needed to merge this PR? To me it's looking really nice.

@cholmes
Copy link
Member Author

cholmes commented Jul 9, 2025

This is great and id love to see it merged at some point

Sorry, it's felt like 90% done for way too long, I just haven't found the time to polish it off.

Curious, what’s needed to merge this PR? To me it's looking really nice.

I was hoping to cover more tools, and that held me up. And then I wanted to give review on the Sedona one, as it wasn't quite the way I wanted it to look. And I think DuckDB needs a bit more work. And then was hoping to add geoparquet-tools as an option that does everything right, but I need to rename it to get a pip release.

But clearly I just need to cut scope and ship. I'll try to get it in a ready state sometime this week - it's been too long for sure.

@cholmes cholmes marked this pull request as ready for review July 11, 2025 00:36
Copy link
Collaborator

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a read through for grammar and typos. All optional fixes from me...looks great!

Comment on lines +262 to +264
## Spatial Partitioning

Most tools don't yet provide any way to do automatic spatial partitioning across files, when you have larger datasets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rejig this example to look more like the other examples...the other examples actually are doing spatial partitioning, too, they're just repurposing a non-spatial mechanism (sorting and file rotation) to do so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(after this merges!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - and yes, feel free to update the language on how we describe it.

@cholmes
Copy link
Member Author

cholmes commented Jul 14, 2025

Just took a read through for grammar and typos. All optional fixes from me...looks great!

Thanks for the clean up, I accepted all the changes. Will merge in now.

@cholmes cholmes merged commit 0061ff8 into main Jul 14, 2025
2 checks passed
@m-mohr m-mohr deleted the cholmes/distro-guide branch July 14, 2025 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants