Add optional esmvalcore dependency & `to_esmvalcore` method #717

charles-turner-1 · 2025-05-05T23:51:05Z

Changes

Add optional imports following the way polars does it to keep overhead to a minimum.
Add esmvalcore to list of optional imports.
Add to_esmvalcore method to esm_datastore class
Keep track of all searches made on a datastore
Tests for to_esmvalcore method (will require some heavy mocking)

Related issue number

ESMValGroup/ESMValCore#2690

Closes #715

Checklist

Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable

…plified) & tests for the changes to the __init__.py

… imports

charles-turner-1 · 2025-05-07T07:38:03Z

@rbeucher, this now works with the following script - loads an example dataset from the enso-recipes:

>>> import intake
>>> catalog = intake.cat.access_nri

# cat = cat.search(name='(?i)cmip.*', model = 'ACCESS-ESM1-5')
# yields cmip6_fs38

>>> esm_ds = catalog['cmip6_fs38']
"""
This creates the same dataset as
model_datasets = {
"ACCESS-ESM1-5":
    IntakeDataset(
    short_name='tos',
    project='CMIP6',
    mip="Omon",
    exp="historical",
    ensemble="r1i1p1f1",
    timerange="19790101/20190101",
    dataset="ACCESS-ESM1-5",
    grid="gn"
)}
when I was trying to add an IntakeDataset to esmvalcore.
"""

>>> search = dict(
    variable_id='tos',
    table_id='Omon',
    experiment_id='historical',
    member_id='r1i1p1f1',
    source_id='ACCESS-ESM1-5',
    grid_label='gn',
    version='v.*'
)


>>> esm_ds = esm_ds.search(
    **search
)

>>> esmvalcore_dataset = esm_ds.to_esmvalcore()

>>> print(esmvalcore_dataset)

Dataset:
{'dataset': 'ACCESS-ESM1-5',
 'project': 'CMIP6',
 'mip': 'Omon',
 'short_name': 'tos',
 'activity': 'CMIP',
 'ensemble': 'r1i1p1f1',
 'exp': 'historical',
 'frequency': 'mon',
 'grid': 'gn',
 'institute': ['CSIRO'],
 'long_name': 'Sea Surface Temperature',
 'modeling_realm': ['ocean'],
 'original_short_name': 'tos',
 'standard_name': 'sea_surface_temperature',
 'units': 'degC'}
session: 'session-f404cbc9-21b8-40b8-958c-b6e082d05979_20250513_013009'

Is this the sort of functionality you had in mind/ we talked about?

Also cc'ing @bouweandela - this would go some way towards the esmvalcore/intake-esm integration by letting users dump an esmvalcore dataset out from an intake-esm search.

Todo:

Auto-map between facets & search (this might be best handled in esmvalcore using the config & I think I can steal from Bouwe's old intake PR).
Automapping currently happening in intake-esm - move this into esmvalcore.
Keep track of all searches made on a datastore so the user doesn't need to explicitly map facets.
Work out whether it's possible to keep track of searches made on a datastore in instances where we have used the require_all_on argument

I've added a cmorizer arg that I guess we'll need as we develop this further - as of right now I haven't got any idea how we might want to do that. Maybe @rhaegar325 might have some insight as to how this works for mopper?

I have no idea if there is anything resembling a standard CMORizer API - if not, we might need to create something resembling one so that there's a plug & play way of cmorizing stuff in this functionality.

test_to_iris_unvailable needed updating)

…passed to `to_iris` call by user.

…uations where `require_all_on` has been used

…ail in search (#718)

…explicitly

bouweandela · 2025-05-14T14:22:11Z

Thanks for cc'ing me! I'm not sure if integrating ESMValCore into intake-esm this is the right way around (I would personally integrate support for intake-esm into the esmvalcore.dataset.Dataset.find_files method), but if it works for you then go for it.

You may want to call the method to_esmvalcore instead of to_iris, as you're creating an esmvalcore.dataset.Dataset object instead of to an iris.cube.Cube or iris.cube.CubeList.

There may be no need for a cmorizer argument, as CMORization is done when you call esmvalcore.dataset.Dataset.load.

if there is anything resembling a standard CMORizer API

Not yet, but there are plans to develop something like that as part of the upcoming Horizon Europe proposals. It will probably look something like fix(ds: xarray.Dataset, identifier:str | None) -> xarray.Dataset where identifier could be the instance_id (e.g. CMIP6.CMIP.CSIRO.ACCESS-ESM1-5.historical.r1i1p1f1.Amon.pr.gn.v20191115) for CMIP data and something else uniquely identifying the dataset for other data, or if not provided it could be read from the dataset attributes (assuming that those are correct).

rbeucher · 2025-05-14T21:58:02Z

Thanks for cc'ing me! I'm not sure if integrating ESMValCore into intake-esm this is the right way around (I would personally integrate support for intake-esm into the esmvalcore.dataset.Dataset.find_files method), but if it works for you then go for it.

I'm OK with that, but I agree it should be a to_esmvalcore function.

charles-turner-1 · 2025-05-14T23:53:50Z

Thanks for cc'ing me! I'm not sure if integrating ESMValCore into intake-esm this is the right way around (I would personally integrate support for intake-esm into the esmvalcore.dataset.Dataset.find_files method), but if it works for you then go for it.

I'm planning on adding that too - the intake-esm branch/draft PR on the esmvalcore repo should have some code starting with that process already. Most of the functionality in this PR is implemented in the ESMValCore repo anyway.

This PR is mostly just low hanging fruit for our users, who are already going to be used to accessing data through intake catalogues.

if there is anything resembling a standard CMORizer API

Not yet, but there are plans to develop something like that as part of the upcoming Horizon Europe proposals. It will probably look something like fix(ds: xarray.Dataset, identifier:str | None) -> xarray.Dataset where identifier could be the instance_id (e.g. CMIP6.CMIP.CSIRO.ACCESS-ESM1-5.historical.r1i1p1f1.Amon.pr.gn.v20191115) for CMIP data and something else uniquely identifying the dataset for other data, or if not provided it could be read from the dataset attributes (assuming that those are correct).

Cool, that's really handy to know - thanks!

charles-turner-1 added 4 commits May 6, 2025 07:47

Add optional imports following the polars optional import method (sim…

c9d3a19

…plified) & tests for the changes to the __init__.py

Move optional import stuff out to a separate module to avoid circular…

0e82529

… imports

WIP

f77bad1

Basic working example

df46b2d

charles-turner-1 force-pushed the to-iris branch from 99e0726 to df46b2d Compare May 7, 2025 07:28

charles-turner-1 and others added 4 commits May 7, 2025 15:41

Fix broken test - needed mock

48dc74d

Merge branch 'main' into to-iris

0ba6afe

Fix another broken test (updated function signature in to_iris,

da7658e

test_to_iris_unvailable needed updating)

Facets now read from esmvalcore intake configuration file instead of …

5961ec0

…passed to `to_iris` call by user.

charles-turner-1 force-pushed the to-iris branch from 759c6d9 to 5961ec0 Compare May 9, 2025 07:34

WIP (has polars change broken require_all?)

43c6e34

charles-turner-1 mentioned this pull request May 12, 2025

Bug in documentation "Enforce search query criteria via require_all_on argument" #667

Closed

charles-turner-1 added a commit to ESMValGroup/ESMValCore that referenced this pull request May 12, 2025

Add _read_facets to intake configuration: see intake/intake-esm#717

59e4205

Search history memory on esm_datastores. Probably won't work in sit…

410a388

…uations where `require_all_on` has been used

charles-turner-1 force-pushed the to-iris branch 2 times, most recently from ad82c60 to ee0dbec Compare May 13, 2025 00:37

charles-turner-1 and others added 2 commits May 13, 2025 08:41

Fixed bug where pyarrow conversions were causing string accessor to f…

93daea8

…ail in search (#718)

Merge branch 'main' into to-iris

df051d1

charles-turner-1 force-pushed the to-iris branch from ee0dbec to df051d1 Compare May 13, 2025 00:41

charles-turner-1 and others added 5 commits May 13, 2025 10:42

Merge branch 'main' into to-iris

065b29f

Rebase doing weird things (!)

24c6803

Track search history to build esmvalcore facets, rather than passing …

4618138

…explicitly

Move _merge_search_history into esmvalcore

73f150e

Removed irrelevant change

08be40d

rename to_iris => to_esmvalcore

3b961d6

charles-turner-1 changed the title ~~Add optional esmvalcore dependency & `to_iris method~~ Add optional esmvalcore dependency & to_esmvalcore method May 15, 2025

Merge branch 'main' into to-iris

f424ad6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add optional esmvalcore dependency & `to_esmvalcore` method #717

Add optional esmvalcore dependency & `to_esmvalcore` method #717

Uh oh!

charles-turner-1 commented May 5, 2025 •

edited

Loading

Uh oh!

charles-turner-1 commented May 7, 2025 •

edited

Loading

Uh oh!

bouweandela commented May 14, 2025

Uh oh!

rbeucher commented May 14, 2025

Uh oh!

charles-turner-1 commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add optional esmvalcore dependency & to_esmvalcore method #717

Are you sure you want to change the base?

Add optional esmvalcore dependency & to_esmvalcore method #717

Uh oh!

Conversation

charles-turner-1 commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Related issue number

Checklist

Uh oh!

charles-turner-1 commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bouweandela commented May 14, 2025

Uh oh!

rbeucher commented May 14, 2025

Uh oh!

charles-turner-1 commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add optional esmvalcore dependency & `to_esmvalcore` method #717

Add optional esmvalcore dependency & `to_esmvalcore` method #717

charles-turner-1 commented May 5, 2025 •

edited

Loading

charles-turner-1 commented May 7, 2025 •

edited

Loading