Skip to content

Comments

ilamb3 updates#548

Open
nocollier wants to merge 15 commits intomainfrom
ilamb-updates
Open

ilamb3 updates#548
nocollier wants to merge 15 commits intomainfrom
ilamb-updates

Conversation

@nocollier
Copy link
Contributor

@nocollier nocollier commented Feb 19, 2026

Description

  • bumps the version of ilamb3 to v2026.2.19.
  • removes the ohc-noaa comparison as per the MBTT via Forrest
  • adds a evspsbl-pr metric for land as per the MBTT, will need to put this file on the S3 bucket ilamb/evspsbl/GLEAMv3.3a/et.nc sha1:5aaf73949af8c6b509ef16f684aa8efeccd983e2
  • uses obs4REF data when possible. Cannot use WOA data as its date range starts after CMIP6 historical ends. Cannot use RAPID as someone removed dimensional information (depth_bonds, latitude) that I need.
  • fixes model line plots being indistinguishable from the reference
  • all ilamb/iomb are running for me locally

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

@nocollier
Copy link
Contributor Author

Hey @lewisjared, I did my best to update ilamb here, use obs4REF where I could and do what the MBTT has asked. I am seeing errors on things I didn't touch (or didn't think I touched). Any help/guidance you can provide would be helpful. Have a good one :D

@lewisjared
Copy link
Contributor

The lockfile dependencies updated which is causing issues. Most notably we don't support pandas 3 yet #499

thetao-WOA2023-surface:
sources:
# TODO: Update to use the obs4REF equiv
# NOTE: obs4REF equiv starts beyond historical, cannot use for CMIP6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Darwyn72 Can you please take a look at this and the other datasets? Is this due to using a newer version?

Copy link

@Darwyn72 Darwyn72 Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lewisjared this is the first time I've seen this repo, but if you are reading this data from the CEDA Archive there is a newer version of this dataset prepared by Morgan Steckler at ORNL using ILAMB. It's in the CEDA Archive under NOAA-NCEI-OCL/mon/thetao/v20251024 - the previous version was under v20250923. If you are not reading it from CEDA Archive. This dataset only provides climatological means, so the start and end time are not applicable, the climatological mean is calculated over the following two time periods: 2005 – 2014, 2015 – 2022

I could send a copy of this CMOR-like dataset to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lewisjared This is the latest list of ILAMB prepared datasets. Morgan Steckler asks that any errors found by the REF with the data are logged here I can't forward the data to you by email as the files are too large. For WOA2023 the two files for thetao and so are dated 2025-10-24. Thanks, Paul

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be the official location of these data? Will these be moved to CEDA?

I'm a bit wary as I'm pretty sure the REF isn't using all of the latests datasets you have curated. Is there a complete list of download URLs for all datasets that have been vetted?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lewisjared these are the datasets I have put on the CEDA Archive - although this is not (yet) public. Alison Waterfall at CEDA is supposed to put these data onto ESGF-NG once it is ready, either in an obs4REF or obs4MIPs project, or some temporary folder for the REF.
image
Ranjini has an Airtable that mirrors Table A1 and B1 in the REF paper for reference. This is a link to a cutdown version of the Master view. Maybe this is helpful?

- msftmz_to_rapid
sources:
# TODO: Update to use the obs4REF equiv
# obs4REF equiv does not work, changes `depth` to `olevel` and removed the depth bounds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lewisjared I think the issue was that CMOR tables don't work with the dimension 'depth'. In the Omon.json table in obs4MIPs the dimensions are called 'latitude, olevel, basin, time':
},
"msftmz":{
"cell_measures":"",
"cell_methods":"longitude: sum (comment: basin sum [along zig-zag grid path]) depth: sum time: mean",
"comment":"Overturning mass streamfunction arising from all advective mass transport processes, resolved and parameterized.",
"dimensions":"latitude olevel basin time",
"frequency":"mon",
"long_name":"Ocean Meridional Overturning Mass Streamfunction",
"modeling_realm":"ocean",
"ok_max_mean_abs":"",
"ok_min_mean_abs":"",
"out_name":"msftmz",
"positive":"",
"standard_name":"ocean_meridional_overturning_mass_streamfunction",
"type":"real",
"units":"kg s-1",
"valid_max":"",
"valid_min":""
},

Copy link
Contributor Author

@nocollier nocollier Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting because published CMIP6 data looks like:

	float msftmz(time, basin, lev, lat) ;
		msftmz:standard_name = "ocean_meridional_overturning_mass_streamfunction" ;

where lev includes bounds which in this case are pretty important because the quantity has been integrated across those bounds. My dataset was changed:

	double msftmz(time) ;
		msftmz:_FillValue = NaN ;
		msftmz:standard_name = "ocean_meridional_overturning_mass_streamfunction" ;
		msftmz:long_name = "Ocean Meridional Overturning Mass Streamfunction" ;
		msftmz:comment = "Overturning mass streamfunction arising from all advective mass transport processes, resolved and parameterized." ;
		msftmz:units = "Sv" ;

where my other dimensions are even removed completely. A olevel appears in the dataset and set to 0 without bounds and so now we don't read the interval over which the quantity was integrated. A lat also appears but neither are associated with msftmz as coordinates, they are just present in the file.

What is infuriating is that the standardization process has dropped information and changed to a less-descriptive format for the sake of standards. Neither does the dataset say who did this, my name appears as the contributor. It may adhere to standards, but it is useless to REF without hacking my source code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this new dataset does ID all the dimensions, they just aren't specified as coordinates:

msftmz:coordinates = "lat basin olevel"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nocollier @lewisjared I suspect I might have uploaded a version of the Rapid dataset to the CEDA Archive that I tried to do with CMOR, and this was the only way I could get it to work. I actually think there is a correct version that Morgan has already done using ILAMB that has the bounds etc. The .py script for it is on obs4MIPs

@nocollier
Copy link
Contributor Author

The lockfile dependencies updated which is causing issues. Most notably we don't support pandas 3 yet #499

I didn't expect that, I was treating the lockfile as just information that is used to compute an environment more quickly. What should I do?

@lewisjared
Copy link
Contributor

The lockfile dependencies updated which is causing issues. Most notably we don't support pandas 3 yet #499

I didn't expect that, I was treating the lockfile as just information that is used to compute an environment more quickly. What should I do?

The lockfile is the set of dependencies we run the test suite with.

I usually do something like this when I get conflicts in the lockfile:

git checkout origin/main uv.lock
uv lock

This resets the lockfile to main and then regenerates with the changes in this branch. I don't try fix the merge conflicts by hand.

I've added a commit to do that

@nocollier
Copy link
Contributor Author

nocollier commented Feb 23, 2026

Thanks for the help @lewisjared. I have pushed another change which makes use of obs4MIPs Data Requirements if a dictionary of keywords is given in the configure. In the end I spent a while trying to figure out how to make mypy not complain about a few lines and just pushed them here to see if you have a suggestion.

I was asked to use the lai dataset that has been ingested into obs4MIPs already and I did change the configure to use it. However, they have ingested a 5km version and it won't run on my machine without running out of memory. It should work--the others do--but I didn't get to look at the output.

This also means that it cannot pass regression tests. I was thinking to wait until some of the data resolves before redoing all the regression tests. Can change that plan as you think is best.

@lewisjared
Copy link
Contributor

I'll fetch the lai data and see if I can run it and see how much memory it requires. It might need special handling as most other diagnostics don't require huge amounts of memory.

@lewisjared lewisjared mentioned this pull request Feb 24, 2026
3 tasks
* origin/main: (80 commits)
  chore: Update comment
  chore: upgrade pins for ilamb
  fix: revert compat=override on open_mfdataset
  docs: add changelog for #565
  chore: Upgrade lockfile and fix some errors
  chore: add coverage
  chore: add default separator in alembic
  fix: time_coder warning
  chore: Pin to use tas
  fix(solver): preserve DataCatalog wrapper in apply_dataset_filters
  fix(tests): use to_frame() when accessing DataCatalog in solver tests
  docs: Changelog
  chore: run the finalise in threads
  chore: clean up
  chore: add fix changelog entry for PR #561
  feat(cli): add --dataset-filter option to datasets list command
  chore: add changelog entry for PR #561
  feat(solver): add --dataset-filter option to filter input datasets when solving
  chore: Support env variables for parser
  feat(solver): add optional --limit flag to solve command
  ...
@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
core 93.22% <100.00%> (ø)
providers ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../climate-ref-core/src/climate_ref_core/datasets.py 87.87% <100.00%> (ø)

... and 34 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lewisjared
Copy link
Contributor

@nocollier I merged main back into this branch and fixed the mypy errors. The new dataset is also uploaded.

This will require adding extra datasets to the data catalogs we use to test if the constraints are working. Currently we can only generate these from scratch.

I'll have a look tomorrow to see how we can unblock that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants