Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the collection in the geoparquet metadata #428

Open
gadomski opened this issue Sep 23, 2024 · 4 comments · May be fixed by #437
Open

Include the collection in the geoparquet metadata #428

gadomski opened this issue Sep 23, 2024 · 4 comments · May be fixed by #437
Labels

Comments

@gadomski
Copy link
Member

https://github.com/stac-utils/stac-geoparquet/blob/main/spec/stac-geoparquet-spec.md#including-a-stac-collection-json-in-a-stac-geoparquet-collection

I think we'd support this on the way in either by:

  • A builder, or
  • Expecting it on the input item collection (cleaner, I think)

Way out would be similar-ish, where if we didn't do it on an item collection we'd just hit the parquet file twice, which seems bad.

Probably just add a collection field to an ItemCollection, eh?

@gadomski gadomski added the [crate] core stac label Sep 23, 2024
@bitner
Copy link

bitner commented Sep 23, 2024

It should probably be a collections array field rather than singular to accommodate geoparquet that may have Items from multiple collections.

@gadomski gadomski added this to the core v0.10.2 milestone Sep 23, 2024
@gadomski
Copy link
Member Author

@bitner should we update the spec to reflect this? It seems to suggest that only a single collection should be included.

@gadomski gadomski linked a pull request Sep 23, 2024 that will close this issue
@gadomski gadomski removed this from the core v0.10.2 milestone Oct 18, 2024
@bitner
Copy link

bitner commented Nov 12, 2024

I see two use cases here --

  1. Geoparquet as an archive format that is generally going to be partitioned by Collection in which case there would be a single Collection per parquet file
  2. Geoparquet as an output format for an API search in which case there would be potentially multiple Collections.

It would seem to me that we would want to update the spec to accommodate both.

@gadomski
Copy link
Member Author

🤔 I think my instinct is to make collections a map in the metadata so you can always do a quick key lookup (in psuedo-json):

{
  "collections": {
    "collection-a": { ... },
    "collection-b": { ... }
  },
  "items": [
    { "id": "item-a", "collection": "collection-a", ... },
    { "id": "item-b", "collection": "collection-b", ... }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants