Skip to content

Conversation

@charles-turner-1
Copy link
Collaborator

@charles-turner-1 charles-turner-1 commented May 5, 2025

Changes

  • Add optional imports following the way polars does it to keep overhead to a minimum.
  • Add esmvalcore to list of optional imports.
  • Add to_esmvalcore method to esm_datastore class
  • Keep track of all searches made on a datastore
  • Tests for to_esmvalcore method (will require some heavy mocking)

Related issue number

ESMValGroup/ESMValCore#2690

Closes #715

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable

@charles-turner-1
Copy link
Collaborator Author

charles-turner-1 commented May 7, 2025

@rbeucher, this now works with the following script - loads an example dataset from the enso-recipes:

>>> import intake
>>> catalog = intake.cat.access_nri

# cat = cat.search(name='(?i)cmip.*', model = 'ACCESS-ESM1-5')
# yields cmip6_fs38

>>> esm_ds = catalog['cmip6_fs38']
"""
This creates the same dataset as
model_datasets = {
"ACCESS-ESM1-5":
    IntakeDataset(
    short_name='tos',
    project='CMIP6',
    mip="Omon",
    exp="historical",
    ensemble="r1i1p1f1",
    timerange="19790101/20190101",
    dataset="ACCESS-ESM1-5",
    grid="gn"
)}
when I was trying to add an IntakeDataset to esmvalcore.
"""

>>> search = dict(
    variable_id='tos',
    table_id='Omon',
    experiment_id='historical',
    member_id='r1i1p1f1',
    source_id='ACCESS-ESM1-5',
    grid_label='gn',
    version='v.*'
)


>>> esm_ds = esm_ds.search(
    **search
)

>>> esmvalcore_dataset = esm_ds.to_esmvalcore()

>>> print(esmvalcore_dataset)

Dataset:
{'dataset': 'ACCESS-ESM1-5',
 'project': 'CMIP6',
 'mip': 'Omon',
 'short_name': 'tos',
 'activity': 'CMIP',
 'ensemble': 'r1i1p1f1',
 'exp': 'historical',
 'frequency': 'mon',
 'grid': 'gn',
 'institute': ['CSIRO'],
 'long_name': 'Sea Surface Temperature',
 'modeling_realm': ['ocean'],
 'original_short_name': 'tos',
 'standard_name': 'sea_surface_temperature',
 'units': 'degC'}
session: 'session-f404cbc9-21b8-40b8-958c-b6e082d05979_20250513_013009'

Is this the sort of functionality you had in mind/ we talked about?

Also cc'ing @bouweandela - this would go some way towards the esmvalcore/intake-esm integration by letting users dump an esmvalcore dataset out from an intake-esm search.

Todo:

  • Auto-map between facets & search (this might be best handled in esmvalcore using the config & I think I can steal from Bouwe's old intake PR).
  • Automapping currently happening in intake-esm - move this into esmvalcore.
  • Keep track of all searches made on a datastore so the user doesn't need to explicitly map facets.
  • Work out whether it's possible to keep track of searches made on a datastore in instances where we have used the require_all_on argument

I've added a cmorizer arg that I guess we'll need as we develop this further - as of right now I haven't got any idea how we might want to do that. Maybe @rhaegar325 might have some insight as to how this works for mopper?

I have no idea if there is anything resembling a standard CMORizer API - if not, we might need to create something resembling one so that there's a plug & play way of cmorizing stuff in this functionality.

…uations where `require_all_on` has been used
@charles-turner-1 charles-turner-1 force-pushed the to-iris branch 2 times, most recently from ad82c60 to ee0dbec Compare May 13, 2025 00:37
@bouweandela
Copy link

Thanks for cc'ing me! I'm not sure if integrating ESMValCore into intake-esm this is the right way around (I would personally integrate support for intake-esm into the esmvalcore.dataset.Dataset.find_files method), but if it works for you then go for it.

You may want to call the method to_esmvalcore instead of to_iris, as you're creating an esmvalcore.dataset.Dataset object instead of to an iris.cube.Cube or iris.cube.CubeList.

There may be no need for a cmorizer argument, as CMORization is done when you call esmvalcore.dataset.Dataset.load.

if there is anything resembling a standard CMORizer API

Not yet, but there are plans to develop something like that as part of the upcoming Horizon Europe proposals. It will probably look something like fix(ds: xarray.Dataset, identifier:str | None) -> xarray.Dataset where identifier could be the instance_id (e.g. CMIP6.CMIP.CSIRO.ACCESS-ESM1-5.historical.r1i1p1f1.Amon.pr.gn.v20191115) for CMIP data and something else uniquely identifying the dataset for other data, or if not provided it could be read from the dataset attributes (assuming that those are correct).

@rbeucher
Copy link
Collaborator

Thanks for cc'ing me! I'm not sure if integrating ESMValCore into intake-esm this is the right way around (I would personally integrate support for intake-esm into the esmvalcore.dataset.Dataset.find_files method), but if it works for you then go for it.

I'm OK with that, but I agree it should be a to_esmvalcore function.

@charles-turner-1
Copy link
Collaborator Author

Thanks for cc'ing me! I'm not sure if integrating ESMValCore into intake-esm this is the right way around (I would personally integrate support for intake-esm into the esmvalcore.dataset.Dataset.find_files method), but if it works for you then go for it.

I'm planning on adding that too - the intake-esm branch/draft PR on the esmvalcore repo should have some code starting with that process already. Most of the functionality in this PR is implemented in the ESMValCore repo anyway.

This PR is mostly just low hanging fruit for our users, who are already going to be used to accessing data through intake catalogues.

if there is anything resembling a standard CMORizer API

Not yet, but there are plans to develop something like that as part of the upcoming Horizon Europe proposals. It will probably look something like fix(ds: xarray.Dataset, identifier:str | None) -> xarray.Dataset where identifier could be the instance_id (e.g. CMIP6.CMIP.CSIRO.ACCESS-ESM1-5.historical.r1i1p1f1.Amon.pr.gn.v20191115) for CMIP data and something else uniquely identifying the dataset for other data, or if not provided it could be read from the dataset attributes (assuming that those are correct).

Cool, that's really handy to know - thanks!

@charles-turner-1 charles-turner-1 changed the title Add optional esmvalcore dependency & `to_iris method Add optional esmvalcore dependency & to_esmvalcore method May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Facilitate creation of ESMValCore Dataset objects from datastores

4 participants