-
Notifications
You must be signed in to change notification settings - Fork 16
pull request for conus404_data branch #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sethmcg
wants to merge
118
commits into
main
Choose a base branch
from
conus404_data
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…rget{}} with hist/fore split
…(minus xforms); partially through xforms implementation
…r use in netcdf output
Please update tests/test_loss import to fix unit test failure. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There's lots and lots of stuff here replumbing the data pipeline to be more flexible and easily configurable.
The main things to check out:
All the rest of it is just updating other stuff to play nice with the new pipeline. (Unless I've forgotten something)
Maybe start with the config, to see my new scheme that lets you use an arbitrary number of datasets and the variables in them however you want; you just indicate which variables are boundary, prognostic, and diagnostic.
You can also define a set of transformations for each variable (including different parameters for each level of a 3D variable). What I've got currently is mostly very simple normalizations, but it's meant to be easy to extend.
Datamap objects pull data out of netcdf files; this is the code that gives the big speedup over xarray. It doesn't handle zarr, but it should be easy to extend that way. The DownscalingDataset object has a collection of datamaps (& their associated transforms), and manages pulling data from all of them and transforming it into a tensor sample.
I think it would be straightforward to translate existing models to use the new pipeline, and that having a stronger separation of concerns like this would enable us to clean up a whole lot of duplicate code.
I don't see how you'd use multiprocessing for downscaling in the same way that you do for forecasting, so I haven't used it and may have missed related things, so keep an eye out for that.
I can train a simple U-net; I haven't yet gotten a crossformer updated to use the new pipeline. I'm currently working on
applications/rollout_downscaling.py
, which is incomplete.