Skip to content

intake_xarray does not lazy read metadata from files #137

@kadykov

Description

@kadykov

The entries powered by intake_xarray driver does not lazy read metadata from the files.

# %%
import intake
import xarray as xr

ds = xr.Dataset(
    {
        "test_var": [0],
    },
    attrs={"xarray_metadata": "The metadata in the xarray file"},
)
ds.to_netcdf("test_metadata.nc")
ds.to_zarr("test_metadata.zarr", mode="w")
# %%
catalog_content = """sources:
  netcdf:
    driver: netcdf
    args:
      urlpath: '{{ CATALOG_DIR }}/test_metadata.nc'
      metadata:
        catalog_metadata: The metadata in the catalog entry
  zarr_intake_xarray:
    description: zarr archive read by intake_xarray
    driver: zarr
    args:
      urlpath: '{{ CATALOG_DIR }}/test_metadata.zarr'
      metadata:
        catalog_metadata: The metadata in the catalog entry
  zarr_intake:
    description: zarr archive read by intake
    driver: zarr_cat
    args:
      urlpath: '{{ CATALOG_DIR }}/test_metadata.zarr'
      metadata:
        catalog_metadata: The metadata in the catalog entry
"""

with open("catalog.yml", "w") as f:
    f.write(catalog_content)

cat = intake.open_catalog("catalog.yml")
print(f"{cat.netcdf.metadata = }")
print(f"{cat.zarr_intake_xarray.metadata = }")
print(f"{cat.zarr_intake.metadata = }")

As you see from the output, the metadata from the entry powered by intake driver has the field from the zarr file:

cat.netcdf.metadata = {'catalog_metadata': 'The metadata in the catalog entry'}
cat.zarr_intake_xarray.metadata = {'catalog_metadata': 'The metadata in the catalog entry'}
cat.zarr_intake.metadata = {'catalog_metadata': 'The metadata in the catalog entry', 'xarray_metadata': 'The metadata in the xarray file'}

However, after reading the files, the metadata is complete:

cat.netcdf.read()
cat.zarr_intake_xarray.read()

print(f"Netcdf metadata after reading: {cat.netcdf.metadata}")
print(f"Zarr metadata after reading: {cat.zarr_intake_xarray.metadata}")

Output:

Netcdf metadata after reading: {'catalog_metadata': 'The metadata in the catalog entry', 'dims': {'test_var': 1}, 'data_vars': {}, 'coords': ('test_var',), 'xarray_metadata': 'The metadata in the xarray file'}
Zarr metadata after reading: {'catalog_metadata': 'The metadata in the catalog entry', 'dims': {'test_var': 1}, 'data_vars': {}, 'coords': ('test_var',), 'xarray_metadata': 'The metadata in the xarray file'}

OS: Windows 10
python 3.11.5
intake 0.7.0
intake_xarray 0.7.0
xarray 2023.8.0
zarr 2.16.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions