-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
Description
When trying to open a multi-file dataset where the coordinate precision changes halfway through, this error is encountered:
...
esm_ds.to_dask(
xarray_open_kwargs = {
"decode_timedelta" : False,
},
)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File [/g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/intake_esm/source.py:287](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/intake_esm/source.py#line=286), in ESMDataSource._open_dataset(self)
286 else:
--> 287 raise exc
289 self._ds.attrs[OPTIONS['dataset_key']] = self.key
File [/g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/intake_esm/source.py:272](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/intake_esm/source.py#line=271), in ESMDataSource._open_dataset(self)
271 try:
--> 272 self._ds = xr.combine_by_coords(
273 datasets, **self.xarray_combine_by_coords_kwargs
274 )
275 except ValueError as exc:
File [/g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/xarray/structure/combine.py:983](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/xarray/structure/combine.py#line=982), in combine_by_coords(data_objects, compat, data_vars, coords, fill_value, join, combine_attrs)
981 # Perform the multidimensional combine on each group of data variables
982 # before merging back together
--> 983 concatenated_grouped_by_data_vars = tuple(
984 _combine_single_variable_hypercube(
985 tuple(datasets_with_same_vars),
986 fill_value=fill_value,
987 data_vars=data_vars,
988 coords=coords,
989 compat=compat,
990 join=join,
991 combine_attrs=combine_attrs,
992 )
993 for vars, datasets_with_same_vars in grouped_by_vars
994 )
996 return merge(
997 concatenated_grouped_by_data_vars,
998 compat=compat,
(...)
1001 combine_attrs=combine_attrs,
1002 )
File [/g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/xarray/structure/combine.py:984](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/xarray/structure/combine.py#line=983), in <genexpr>(.0)
981 # Perform the multidimensional combine on each group of data variables
982 # before merging back together
983 concatenated_grouped_by_data_vars = tuple(
--> 984 _combine_single_variable_hypercube(
985 tuple(datasets_with_same_vars),
986 fill_value=fill_value,
987 data_vars=data_vars,
988 coords=coords,
989 compat=compat,
990 join=join,
991 combine_attrs=combine_attrs,
992 )
993 for vars, datasets_with_same_vars in grouped_by_vars
994 )
996 return merge(
997 concatenated_grouped_by_data_vars,
998 compat=compat,
(...)
1001 combine_attrs=combine_attrs,
1002 )
File [/g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/xarray/structure/combine.py:671](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/analysis3-25.06/lib/python3.11/site-packages/xarray/structure/combine.py#line=670), in _combine_single_variable_hypercube(datasets, fill_value, data_vars, coords, compat, join, combine_attrs)
670 if not (indexes.is_monotonic_increasing or indexes.is_monotonic_decreasing):
--> 671 raise ValueError(
672 "Resulting object does not have monotonic"
673 f" global indexes along dimension {dim}"
674 )
676 return concatenated
ValueError: Resulting object does not have monotonic global indexes along dimension yhwhich then percolates through to an ESMDataSourceError.
This is the offending line:
293 self._ds = xr.combine_by_coords(
datasets, **self.xarray_combine_by_coords_kwargs
)Crucially, it is possible to directly open the paths from the datastore directly with xr.open_mfdataset, so I see no reason why we can't in principle fix this:
pathlist = esm_ds.df['path'].tolist()
ds = xr.open_mfdataset(pathlist,decode_timedelta=False,parallel=True, chunks={'time' : -1})
print(ds)
<xarray.Dataset> Size: 14GB
Dimensions: (time: 1068, yh: 2204, xh: 1440, nv: 2)
Coordinates:
* xh (xh) float64 12kB -279.9 -279.6 -279.4 ... 79.38 79.62 79.88
* yh (yh) float64 18kB -80.94 -80.94 -80.87 ... 89.84 89.95 89.95
* nv (nv) float64 16B 1.0 2.0
* time (time) object 9kB 1900-01-16 12:00:00 ... 1988-12-16 12:00:00
Data variables:
wfo (time, yh, xh) float32 14GB dask.array<chunksize=(12, 142, 240), meta=np.ndarray>
average_T1 (yh, time) datetime64[ns] 19MB dask.array<chunksize=(1142, 12), meta=np.ndarray>
average_T2 (yh, time) datetime64[ns] 19MB dask.array<chunksize=(1142, 12), meta=np.ndarray>
average_DT (yh, time) float64 19MB dask.array<chunksize=(1142, 12), meta=np.ndarray>
time_bnds (yh, time, nv) object 38MB dask.array<chunksize=(1142, 12, 2), meta=np.ndarray>
Attributes:
NumFilesInSet: 1
title: ACCESS-OM3
associated_files: areacello: access-om3.mom6.static.nc
grid_type: regular
grid_tile: N/AVersion information: output of intake_esm.show_versions()
I've reproduced this with intake_esm versions 2025.2.3 and 2024.2.6:
2025.2.3
import intake_esm
intake_esm.show_versions()
INSTALLED VERSIONS
------------------
cftime: 1.6.4
dask: 2025.5.1
fastprogress: 1.0.3
fsspec: 2025.5.1
gcsfs: 2025.5.1
intake: 2.0.8
intake_esm: 2025.2.3
netCDF4: 1.7.2
pandas: 2.2.3
requests: 2.32.3
s3fs: 2025.5.1
xarray: 2025.4.0
zarr: 2.18.72025.2.6
import intake_esm; intake_esm.show_versions()"
INSTALLED VERSIONS
------------------
cftime: 1.6.4
dask: 2024.11.2
fastprogress: 1.0.3
fsspec: 2024.10.0
gcsfs: 2024.10.0
intake: 0.7.0
intake_esm: 2024.2.6
netCDF4: 1.6.5
pandas: 2.2.3
requests: 2.32.3
s3fs: 2024.10.0
xarray: 2024.2.0
zarr: 2.18.3Metadata
Metadata
Assignees
Labels
No labels