Experiment with stacking for kerchunk #38

jsignell · 2024-03-11T20:03:11Z

This idea came out of a comment here: #34 (comment)

Conceptually it seems like it should be possible to read and stack kerchunk and zarr data contained in an item's assets or a list of item's assets. Not sure if this is the most elegant way 🤷

import pystac
import xarray as xr

url_1 = "https://gist.githubusercontent.com/clausmichele/28efa0007731044db3a7752da2164fe0/raw/1cba235038f0aa20e16675a863224a4f3ab79e4a/CERRA-20010101000000_20011231000000.json"
url_2 = "https://gist.githubusercontent.com/clausmichele/6b78a70ef153c4c841401ec0b7d2b75f/raw/e0d2f307b1f8caef7ec19ae68b8100fb7d5f25dd/CERRA-20020101000000_20021231000000.json"

item_1 = pystac.read_file(url_1)
item_2 = pystac.read_file(url_2)
items = [item_1, item_2]

# these items don't specify the media_type and role that xpystac uses to assert that
# an asset refers to a kerchunk reference file. So first tidy that up.
for item in items:
    for asset in item.assets.values():
        if asset.href.endswith(".json"):
            asset.media_type = "application/json"
            asset.roles = ["index"]

data = xr.open_dataset(items, engine="stac", stacking_library="xpystac", chunks={})

clausmichele · 2024-03-18T15:27:17Z

@jsignell you can use these new version of the Items, with the correct media type and roles set to index:


url_1 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20010101000000_20011231000000_2.json"
url_2 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20020101000000_20021231000000_2.json"

jsignell · 2024-04-05T15:50:30Z

Nice! Yeah it works well with those versions:

import pystac
import xarray as xr


url_1 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20010101000000_20011231000000_2.json"
url_2 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20020101000000_20021231000000_2.json"

item_1 = pystac.read_file(url_1)
item_2 = pystac.read_file(url_2)
items = [item_1, item_2]

data = xr.open_dataset(items, engine="stac", stacking_library="xpystac", chunks={})
data

Since it's purely additive I don't see the harm in merging this once I write up some tests.

jsignell · 2025-03-11T17:15:33Z

@clausmichele is this still a workflow that you are interested in? I was just going through open issues and this looks like it was very close, but the gists have gotten stale.

jsignell · 2025-03-11T20:43:01Z

I'm actually wondering whether open_mfdataset + xpystac might cover this case already.

clausmichele · 2025-03-12T08:56:02Z

The code samples you shared with me were enough for our use case, thanks!

jsignell · 2025-03-12T13:24:45Z

I'm going to close this as really more than xpystac should be doing.

Experiment with native stacking for kerchunk

b1688a7

Merge branch 'main' into js/kerchunk-stacking

40f381b

Merge branch 'main' into js/kerchunk-stacking

02ba485

jsignell closed this Mar 12, 2025

jsignell deleted the js/kerchunk-stacking branch March 12, 2025 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment with stacking for kerchunk #38

Experiment with stacking for kerchunk #38

jsignell commented Mar 11, 2024

clausmichele commented Mar 18, 2024

jsignell commented Apr 5, 2024

jsignell commented Mar 11, 2025

jsignell commented Mar 11, 2025

clausmichele commented Mar 12, 2025

jsignell commented Mar 12, 2025

Experiment with stacking for kerchunk #38

Experiment with stacking for kerchunk #38

Conversation

jsignell commented Mar 11, 2024

clausmichele commented Mar 18, 2024

jsignell commented Apr 5, 2024

jsignell commented Mar 11, 2025

jsignell commented Mar 11, 2025

clausmichele commented Mar 12, 2025

jsignell commented Mar 12, 2025