Skip to content

pull request for conus404_data branch #194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 118 commits into
base: main
Choose a base branch
from
Open

pull request for conus404_data branch #194

wants to merge 118 commits into from

Conversation

sethmcg
Copy link
Collaborator

@sethmcg sethmcg commented May 10, 2025

There's lots and lots of stuff here replumbing the data pipeline to be more flexible and easily configurable.

The main things to check out:

config/downscaling.yml
credit/data_downscaling.py
credit/datamap.py
credit/transforms_data.py

All the rest of it is just updating other stuff to play nice with the new pipeline. (Unless I've forgotten something)

Maybe start with the config, to see my new scheme that lets you use an arbitrary number of datasets and the variables in them however you want; you just indicate which variables are boundary, prognostic, and diagnostic.

You can also define a set of transformations for each variable (including different parameters for each level of a 3D variable). What I've got currently is mostly very simple normalizations, but it's meant to be easy to extend.

Datamap objects pull data out of netcdf files; this is the code that gives the big speedup over xarray. It doesn't handle zarr, but it should be easy to extend that way. The DownscalingDataset object has a collection of datamaps (& their associated transforms), and manages pulling data from all of them and transforming it into a tensor sample.

I think it would be straightforward to translate existing models to use the new pipeline, and that having a stronger separation of concerns like this would enable us to clean up a whole lot of duplicate code.

I don't see how you'd use multiprocessing for downscaling in the same way that you do for forecasting, so I haven't used it and may have missed related things, so keep an eye out for that.

I can train a simple U-net; I haven't yet gotten a crossformer updated to use the new pipeline. I'm currently working on applications/rollout_downscaling.py, which is incomplete.

sethmcg added 30 commits July 3, 2024 16:44
…(minus xforms); partially through xforms implementation
@sethmcg sethmcg requested review from djgagne, jsschreck and yingkaisha and removed request for yingkaisha May 10, 2025 00:05
@djgagne
Copy link
Collaborator

djgagne commented Jul 18, 2025

Please update tests/test_loss import to fix unit test failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants