Conversation
…m Forrest (on behalf of the MBTT) on 17-Feb
… used in configure files
98b8428 to
2cf846b
Compare
|
Hey @lewisjared, I did my best to update ilamb here, use obs4REF where I could and do what the MBTT has asked. I am seeing errors on things I didn't touch (or didn't think I touched). Any help/guidance you can provide would be helpful. Have a good one :D |
|
The lockfile dependencies updated which is causing issues. Most notably we don't support pandas 3 yet #499 |
| thetao-WOA2023-surface: | ||
| sources: | ||
| # TODO: Update to use the obs4REF equiv | ||
| # NOTE: obs4REF equiv starts beyond historical, cannot use for CMIP6 |
There was a problem hiding this comment.
@Darwyn72 Can you please take a look at this and the other datasets? Is this due to using a newer version?
There was a problem hiding this comment.
@lewisjared this is the first time I've seen this repo, but if you are reading this data from the CEDA Archive there is a newer version of this dataset prepared by Morgan Steckler at ORNL using ILAMB. It's in the CEDA Archive under NOAA-NCEI-OCL/mon/thetao/v20251024 - the previous version was under v20250923. If you are not reading it from CEDA Archive. This dataset only provides climatological means, so the start and end time are not applicable, the climatological mean is calculated over the following two time periods: 2005 – 2014, 2015 – 2022
I could send a copy of this CMOR-like dataset to you?
There was a problem hiding this comment.
@lewisjared This is the latest list of ILAMB prepared datasets. Morgan Steckler asks that any errors found by the REF with the data are logged here I can't forward the data to you by email as the files are too large. For WOA2023 the two files for thetao and so are dated 2025-10-24. Thanks, Paul
There was a problem hiding this comment.
Will this be the official location of these data? Will these be moved to CEDA?
I'm a bit wary as I'm pretty sure the REF isn't using all of the latests datasets you have curated. Is there a complete list of download URLs for all datasets that have been vetted?
There was a problem hiding this comment.
@lewisjared these are the datasets I have put on the CEDA Archive - although this is not (yet) public. Alison Waterfall at CEDA is supposed to put these data onto ESGF-NG once it is ready, either in an obs4REF or obs4MIPs project, or some temporary folder for the REF.

Ranjini has an Airtable that mirrors Table A1 and B1 in the REF paper for reference. This is a link to a cutdown version of the Master view. Maybe this is helpful?
| - msftmz_to_rapid | ||
| sources: | ||
| # TODO: Update to use the obs4REF equiv | ||
| # obs4REF equiv does not work, changes `depth` to `olevel` and removed the depth bounds |
There was a problem hiding this comment.
@lewisjared I think the issue was that CMOR tables don't work with the dimension 'depth'. In the Omon.json table in obs4MIPs the dimensions are called 'latitude, olevel, basin, time':
},
"msftmz":{
"cell_measures":"",
"cell_methods":"longitude: sum (comment: basin sum [along zig-zag grid path]) depth: sum time: mean",
"comment":"Overturning mass streamfunction arising from all advective mass transport processes, resolved and parameterized.",
"dimensions":"latitude olevel basin time",
"frequency":"mon",
"long_name":"Ocean Meridional Overturning Mass Streamfunction",
"modeling_realm":"ocean",
"ok_max_mean_abs":"",
"ok_min_mean_abs":"",
"out_name":"msftmz",
"positive":"",
"standard_name":"ocean_meridional_overturning_mass_streamfunction",
"type":"real",
"units":"kg s-1",
"valid_max":"",
"valid_min":""
},
There was a problem hiding this comment.
That is interesting because published CMIP6 data looks like:
float msftmz(time, basin, lev, lat) ;
msftmz:standard_name = "ocean_meridional_overturning_mass_streamfunction" ;
where lev includes bounds which in this case are pretty important because the quantity has been integrated across those bounds. My dataset was changed:
double msftmz(time) ;
msftmz:_FillValue = NaN ;
msftmz:standard_name = "ocean_meridional_overturning_mass_streamfunction" ;
msftmz:long_name = "Ocean Meridional Overturning Mass Streamfunction" ;
msftmz:comment = "Overturning mass streamfunction arising from all advective mass transport processes, resolved and parameterized." ;
msftmz:units = "Sv" ;
where my other dimensions are even removed completely. A olevel appears in the dataset and set to 0 without bounds and so now we don't read the interval over which the quantity was integrated. A lat also appears but neither are associated with msftmz as coordinates, they are just present in the file.
What is infuriating is that the standardization process has dropped information and changed to a less-descriptive format for the sake of standards. Neither does the dataset say who did this, my name appears as the contributor. It may adhere to standards, but it is useless to REF without hacking my source code.
There was a problem hiding this comment.
I see that this new dataset does ID all the dimensions, they just aren't specified as coordinates:
msftmz:coordinates = "lat basin olevel"
There was a problem hiding this comment.
@nocollier @lewisjared I suspect I might have uploaded a version of the Rapid dataset to the CEDA Archive that I tried to do with CMOR, and this was the only way I could get it to work. I actually think there is a correct version that Morgan has already done using ILAMB that has the bounds etc. The .py script for it is on obs4MIPs
I didn't expect that, I was treating the lockfile as just information that is used to compute an environment more quickly. What should I do? |
The lockfile is the set of dependencies we run the test suite with. I usually do something like this when I get conflicts in the lockfile: This resets the lockfile to main and then regenerates with the changes in this branch. I don't try fix the merge conflicts by hand. I've added a commit to do that |
… requirement facets
|
Thanks for the help @lewisjared. I have pushed another change which makes use of obs4MIPs Data Requirements if a dictionary of keywords is given in the configure. In the end I spent a while trying to figure out how to make I was asked to use the lai dataset that has been ingested into obs4MIPs already and I did change the configure to use it. However, they have ingested a 5km version and it won't run on my machine without running out of memory. It should work--the others do--but I didn't get to look at the output. This also means that it cannot pass regression tests. I was thinking to wait until some of the data resolves before redoing all the regression tests. Can change that plan as you think is best. |
|
I'll fetch the lai data and see if I can run it and see how much memory it requires. It might need special handling as most other diagnostics don't require huge amounts of memory. |
* origin/main: (80 commits) chore: Update comment chore: upgrade pins for ilamb fix: revert compat=override on open_mfdataset docs: add changelog for #565 chore: Upgrade lockfile and fix some errors chore: add coverage chore: add default separator in alembic fix: time_coder warning chore: Pin to use tas fix(solver): preserve DataCatalog wrapper in apply_dataset_filters fix(tests): use to_frame() when accessing DataCatalog in solver tests docs: Changelog chore: run the finalise in threads chore: clean up chore: add fix changelog entry for PR #561 feat(cli): add --dataset-filter option to datasets list command chore: add changelog entry for PR #561 feat(solver): add --dataset-filter option to filter input datasets when solving chore: Support env variables for parser feat(solver): add optional --limit flag to solve command ...
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 34 files with indirect coverage changes 🚀 New features to boost your workflow:
|
|
@nocollier I merged main back into this branch and fixed the mypy errors. The new dataset is also uploaded. This will require adding extra datasets to the data catalogs we use to test if the constraints are working. Currently we can only generate these from scratch. I'll have a look tomorrow to see how we can unblock that. |
Description
ilamb3tov2026.2.19.evspsbl-prmetric for land as per the MBTT, will need to put this file on the S3 bucketilamb/evspsbl/GLEAMv3.3a/et.nc sha1:5aaf73949af8c6b509ef16f684aa8efeccd983e2Checklist
Please confirm that this pull request has done the following:
changelog/