Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with stacking for kerchunk #38

Closed
wants to merge 3 commits into from
Closed

Conversation

jsignell
Copy link
Member

This idea came out of a comment here: #34 (comment)

Conceptually it seems like it should be possible to read and stack kerchunk and zarr data contained in an item's assets or a list of item's assets. Not sure if this is the most elegant way 🤷

import pystac
import xarray as xr

url_1 = "https://gist.githubusercontent.com/clausmichele/28efa0007731044db3a7752da2164fe0/raw/1cba235038f0aa20e16675a863224a4f3ab79e4a/CERRA-20010101000000_20011231000000.json"
url_2 = "https://gist.githubusercontent.com/clausmichele/6b78a70ef153c4c841401ec0b7d2b75f/raw/e0d2f307b1f8caef7ec19ae68b8100fb7d5f25dd/CERRA-20020101000000_20021231000000.json"

item_1 = pystac.read_file(url_1)
item_2 = pystac.read_file(url_2)
items = [item_1, item_2]

# these items don't specify the media_type and role that xpystac uses to assert that
# an asset refers to a kerchunk reference file. So first tidy that up.
for item in items:
    for asset in item.assets.values():
        if asset.href.endswith(".json"):
            asset.media_type = "application/json"
            asset.roles = ["index"]

data = xr.open_dataset(items, engine="stac", stacking_library="xpystac", chunks={})

@clausmichele
Copy link

@jsignell you can use these new version of the Items, with the correct media type and roles set to index:


url_1 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20010101000000_20011231000000_2.json"
url_2 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20020101000000_20021231000000_2.json"

@jsignell
Copy link
Member Author

jsignell commented Apr 5, 2024

Nice! Yeah it works well with those versions:

import pystac
import xarray as xr


url_1 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20010101000000_20011231000000_2.json"
url_2 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20020101000000_20021231000000_2.json"

item_1 = pystac.read_file(url_1)
item_2 = pystac.read_file(url_2)
items = [item_1, item_2]

data = xr.open_dataset(items, engine="stac", stacking_library="xpystac", chunks={})
data

Since it's purely additive I don't see the harm in merging this once I write up some tests.

@jsignell
Copy link
Member Author

@clausmichele is this still a workflow that you are interested in? I was just going through open issues and this looks like it was very close, but the gists have gotten stale.

@jsignell
Copy link
Member Author

I'm actually wondering whether open_mfdataset + xpystac might cover this case already.

@clausmichele
Copy link

The code samples you shared with me were enough for our use case, thanks!

@jsignell
Copy link
Member Author

I'm going to close this as really more than xpystac should be doing.

@jsignell jsignell closed this Mar 12, 2025
@jsignell jsignell deleted the js/kerchunk-stacking branch March 12, 2025 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants