-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Description
I am trying to download the following dataset: "gs://cmip6/CMIP6/ScenarioMIP/CSIRO-ARCCSS/ACCESS-CM2/ssp585/r1i1p1f1/day/pr/gn/v20210317/" whose path I found by this script:
cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
cat = intake.open_esm_datastore(cat_url)
cat_subset2 = cat.search(
source_id="ACCESS-CM2",
member_id=["r1i1p1f1"],
experiment_id=["ssp585"] ,
variable_id=["pr"] ,
table_id=["day"] ,
)
dset_dict = cat_subset2.to_dataset_dict(
xarray_open_kwargs={'use_cftime': False, "decode_times": True, "consolidated": True},
aggregate=False,
storage_options={"token": "anon"},
)
However, to_dataset_dict fails and raises an ESMDataSourceError I believe possibly due to the path being corrupted (having additional .version added so cant find the 'key'). See stacktrace.
Traceback (most recent call last):
...
File "/home/dan/.cache/pypoetry/virtualenvs/seas-DLXvwKbf-py3.9/lib/python3.9/site-packages/intake_esm/source.py", line 208, in _get_schema
self._open_dataset()
File "/home/dan/.cache/pypoetry/virtualenvs/seas-DLXvwKbf-py3.9/lib/python3.9/site-packages/intake_esm/source.py", line 264, in _open_dataset
raise ESMDataSourceError(
intake_esm.source.ESMDataSourceError: Failed to load dataset with key='ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.r1i1p1f1.day.pr.gn.gs://cmip6/CMIP6/ScenarioMIP/CSIRO-ARCCSS/ACCESS-CM2/ssp585/r1i1p1f1/day/pr/gn/v20210317/.20210317'
You can use `cat['ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.r1i1p1f1.day.pr.gn.gs://cmip6/CMIP6/ScenarioMIP/CSIRO-ARCCSS/ACCESS-CM2/ssp585/r1i1p1f1/day/pr/gn/v20210317/.20210317'].df` to inspect the assets/files for this key.
Doing this does not work: cat['ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.r1i1p1f1.day.pr.gn.gs://cmip6/CMIP6/ScenarioMIP/CSIRO-ARCCSS/ACCESS-CM2/ssp585/r1i1p1f1/day/pr/gn/v20210317/.20210317'].df
But this does work:
cat['ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.day.gn'].df
What I Did
So I use the path directly to open zarr. And I find there is an issue with this datasource. Could the time index be corrupted? It says: time units: days since 2251-01-01
So the dataset runs from 2251 to 2300? How can I access the rest of the dataset?
import intake
import intake_esm
import requests
import aiohttp
import xarray as xr
import dask
import gcsfs
pr_ssp585 = xr.open_zarr(
"gs://cmip6/CMIP6/ScenarioMIP/CSIRO-ARCCSS/ACCESS-CM2/ssp585/r1i1p1f1/day/pr/gn/v20210317/", consolidated=True, use_cftime=False, decode_times=False)
pr_ssp585
<xarray.Dataset>
Dimensions: (lat: 144, bnds: 2, lon: 192, time: 18262)
Coordinates:
* lat (lat) float64 -89.38 -88.12 -86.88 -85.62 ... 86.88 88.12 89.38
lat_bnds (lat, bnds) float64 dask.array<chunksize=(144, 2), meta=np.ndarray>
* lon (lon) float64 0.9375 2.812 4.688 6.562 ... 355.3 357.2 359.1
lon_bnds (lon, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray>
* time (time) int64 0 1 2 3 4 5 ... 18256 18257 18258 18259 18260 18261
time_bnds (time, bnds) float64 dask.array<chunksize=(9131, 2), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables:
pr (time, lat, lon) float32 dask.array<chunksize=(495, 144, 192), meta=np.ndarray>
Attributes: (12/50)
Conventions: CF-1.7 CMIP-6.2
activity_id: ScenarioMIP
branch_method: standard
branch_time_in_child: 60265.0
branch_time_in_parent: 60265.0
cmor_version: 3.4.0
... ...
title: ACCESS-CM2 output prepared for CMIP6
tracking_id: hdl:21.14100/d3a15390-8afe-4503-9669-da9b50bd9c99
variable_id: pr
variant_label: r1i1p1f1
version: v20210317
version_id: v20210317
pr_ssp585["time"]
<xarray.DataArray 'time' (time: 18262)>
array([ 0, 1, 2, ..., 18259, 18260, 18261])
Coordinates:
* time (time) int64 0 1 2 3 4 5 6 ... 18256 18257 18258 18259 18260 18261
Attributes:
axis: T
bounds: time_bnds
calendar: proleptic_gregorian
long_name: time
standard_name: time
units: days since 2251-01-01 12:00:00.000000
Version information: output of intake_esm.show_versions()
INSTALLED VERSIONS
cftime: 1.6.4.post1
dask: 2023.12.1
fastprogress: 1.0.3
fsspec: 2025.7.0
gcsfs: 2025.7.0
intake: 0.6.8
intake_esm: 2023.11.10
netCDF4: 1.7.2
pandas: 2.3.2
requests: 2.32.5
s3fs: None
xarray: 2023.12.0
zarr: 2.18.2