Releases: NCAR/miles-credit
Releases · NCAR/miles-credit
v2025.2.0
Major Updates
- Documentation has been significantly updated.
- Rollouts on ERA5 and GFS have been significantly streamlined.
- Physics constraints as postblocks have been added.
- Vertical interpolation to pressure and height levels.
- Experimental ensemble capabilities.
- Many bug fixes and code cleanups.
What's Changed
- Addition of new multi-step dataset that allows batch size > 1 by @jsschreck in #135
- Multistep training with batch_size >=1 per GPU by @jsschreck in #139
- Fix train/valid split read in dataset by @kanz76 in #140
- Gradscaler for grap clipping fix by @kanz76 in #141
- Major update for the inference routine + QoL improvements by @yingkaisha in #134
- Added valid_forecast_len to the result_dict by @kanz76 in #143
- Refactored validate method in trainerERA5_multistep_grad_acum.py for new dataset by @kanz76 in #145
- Minor bugfix on
credit.parser
by @yingkaisha in #144 - Bug fix on the original multi-step dataset by @jsschreck in #146
- Minor bugfix on
credit.data
by @yingkaisha in #147 - bugfix: removed deprecated ERA5Dataset and Bridgescaler_dataset from train.py by @dkimpara in #148
- Ensemble capability by @dkimpara in #137
- Predict scripts support batch size >= 1 by @jsschreck in #142
- rollouts enabled for ensemble, cftime, skebs. train_universal enabled for partial model loading. debugger model for dev support by @dkimpara in #149
- Updates to grad clipping, seed, and checkpointing by @jsschreck in #152
- CREDIT physcis for hybrid simga-pressure level configurations by @yingkaisha in #129
- Fixed trainer classes to actually use DDP by @kanz76 in #153
- Added std to printer log when using ensemble > 1 by @jsschreck in #154
- Docs, requirements, and some code formatting by @djgagne in #155
- Fix rollou_*_batcher.py by @kanz76 in #156
- Fixed varnum_diag bug by @kanz76 in #159
- FSDP to regular model or model weights by @jsschreck in #157
- Fixes to interpolation to pressure levels by @djgagne in #158
- Small fixes so option none works on casper using python and not torchrun by @jsschreck in #161
- Cleaning up saving + multiprocessing in rollout_to_netcdf.py script by @jsschreck in #162
- Updating packaging support by @djgagne in #163
- Documentation improvement of
example-v2025.2.0.yml
by @yingkaisha in #164 - Renaming scripts by @jsschreck in #165
- Updates to data loader and solar by @djgagne in #169
- Add regridding tools + minor bugfix on the inference dataset by @yingkaisha in #166
- Updating config/ by @jsschreck in #167
- Pulled out garbage collection from rollout_metrics by @jsschreck in #172
- Fix unit testing and removed garbage collection by @djgagne in #171
- Documentation updates by @djgagne in #170
- configuration explaination to READtheDocs by @WillyChap in #168
- Detached and re-assigned y_pred to discard computational graph by @kanz76 in #174
- Small temporary update to pbs.py to fix newer cudnn settings on Derecho by @jsschreck in #175
- Realtime Rollout and Predict + Interpolation Updates by @djgagne in #176
- rollout bugfix + documentation by @yingkaisha in #177
- Camulator v01.00 by @WillyChap in #179
- bunch of small fixes, features, comments by @dkimpara in #180
- Updates to bred vector implementation and noisy WXFormer by @jsschreck in #178
- GFS initial conditions by @charlie-becker in #173
- documentation updates by @ggantos in #183
- Skebs v 1.0 by @dkimpara in #182
- Trainer documentation by @jsschreck in #181
- Parallel version of CRPS by @jsschreck in #186
- Documentation hackathon by @djgagne in #184
New Contributors
- @charlie-becker made their first contribution in #173
- @ggantos made their first contribution in #183
Full Changelog: v2024.1.0...v2025.2.0
v2024.1.0
Our first public release of MILES CREDIT.
What's Changed
- add helper scripts to gather global data and scaling params by @WillyChap in #1
- Updated dependencies and added scaler application by @djgagne in #2
- Initial update of multi-step training by @jsschreck in #3
- Adding PBC to crossformer model by @jsschreck in #6
- Path fixing by @jsschreck in #8
- reworked spectralLoss2D to be compatible with new models by @dkimpara in #7
- fixed spectralLoss2D when using no lat weights, also removed deprecated code by @dkimpara in #9
- Updating few bugs in model classes. Using units now in predict.py by @jsschreck in #11
- zarrify.py script update by @sethmcg in #4
- wrote PSDLoss, benchmarked values, added option fields to configs by @dkimpara in #14
- Some quick improvments of
draw_forecast
withinpredict.py
by @yingkaisha in #16 - using os.path.expandvars in train.py,predict.py to enable generic save locs by @dkimpara in #17
- Update
predict.py
,visualization_tools.py
, and config file options by @yingkaisha in #20 - Updated FSDP checkpointing by @jsschreck in #21
- polar and laplacian diffusion filter module class by @WillyChap in #23
- predict.py revamp - viz config options, async image generation, xarray creation, laplacian filtering, CPU-only runtime compatible by @dkimpara in #27
- Modifications of predict.py by @jsschreck in #28
- predict.py - reworked xr save format, using logger by @dkimpara in #29
- Fixing FSDP revert; adding multi-step trajectory trainer by @jsschreck in #31
- Quantile static by @WillyChap in #32
- Updates to xformer model class and xformer configurations by @jsschreck in #34
- Model base class by @dkimpara in #35
- FuXi bug fix + Colorbar adjustments for
visualization_tools
by @yingkaisha in #36 - TOA var and Quantile scaling, by @WillyChap in #37
- Predict.py now supports FSDP by @jsschreck in #38
- Adding static inputs by @jsschreck in #39
- Conus404 data loader by @sethmcg in #22
- adding transform/inverse for bscaler data by @WillyChap in #40
- Addition of CONUS404 data, model, and training scripts by @jsschreck in #41
- Updated FSDP-related bugs by @jsschreck in #42
- edge case fix by @WillyChap in #44
- Graph residual transformer with sparse edge calculation by @djgagne in #26
- KE and spectrum visualization diagnostics + data conversions by @dkimpara in #43
- Adding the replay buffer code as example by @jsschreck in #45
- Adding Climate run capabilites by @WillyChap in #46
- Fuxi update by @jsschreck in #47
- Start of major refactoring of code base by @jsschreck in #48
- TOA data type conversion and input tensor shape comment by @kanz76 in #51
- FuXi model updates and data pipeline initial work by @yingkaisha in #52
- Added Swin transformer model + new rollout script by @jsschreck in #53
- The new data pipeline by @yingkaisha in #55
- Data pipeline update with minor bugfix and documentation by @yingkaisha in #58
- NetCDF metadata, docs, environment fixes by @djgagne in #56
- Conus404 transforms by @sethmcg in #54
- add 6hourly cached option by @WillyChap in #57
- Emergency bug fix for the new roll-out and metadata loader by @yingkaisha in #60
- update yaml to work by @WillyChap in #61
- Add 6 hourly run configs with the new data pipeline + minor bugfix by @yingkaisha in #62
rollout_to_netcdf_new.py
that works with the new data pipeline by @yingkaisha in #64- Updates to multi-step training a la Brenowitz scheme by @jsschreck in #63
- Bugfix on
rollout_to_netcdf_new.py
+ addingexample.yml
by @yingkaisha in #67 - Memory optimization for
one_shot=True
+ minor corrections for theexample.yml
by @yingkaisha in #69 - Initialization of credit.trainers by @jsschreck in #65
- Updates to the multi-step training code by @jsschreck in #70
- new lat weighting by @WillyChap in #66
- Add
dynamic_forcing_variables
into the data workflow by @yingkaisha in #73 - Initial code for the Graph Residual Transformer + GRU model by @kanz76 in #72
- Added reload_epoch option to the trainers by @jsschreck in #76
- minor adjustments on
credit.model
+ update configs by @yingkaisha in #74 - Bugfix on model checkpointing and epoch update by @yingkaisha in #77
- New solar processing and bridgescaler state scaler by @djgagne in #78
- Updating the swin transformer by @jsschreck in #80
- Adding parser and input data checks for
config.yml
+ some minor fixes by @yingkaisha in #79 - Model check-pointing and early-stopping fixes for
base_trainer
by @yingkaisha in #81 - Bug fixes and few updates with variable weights by @jsschreck in #82
- Small bugfix on parsing
one_shot=False
by @yingkaisha in #88 - Updates to multi-step training by @jsschreck in #87
- Adding arXiv paper production run configs + documentation by @yingkaisha in #89
- [In Progress]: PBS Script and some minor improvements. by @negin513 in #90
- few typos by @negin513 in #92
- Fixes to new MPI launch; linting; reorganizing distributed by @jsschreck in #93
- add lev functionality by @WillyChap in #84
- Adding NVIDIA-makani scheme for multi-step training / rollout script for computing metrics on forecasts by @jsschreck in #91
- multistep configurations + optimized cached dataset workflow + old script clean-up by @yingkaisha in #94
- Updated learning rate scheduler and metrics handling by @jsschreck in #95
- Notebooks and new github actions workflow by @djgagne in #97
- Updated README with credit-derecho installation instructions added by @jsschreck in #99
- Updated MPI support rollout_metrics by @jsschreck in #100
- Emergency fix on
rollout_to_netcdf.py
androllout_metrics
by @yingkaisha in #104 - Bugfix on
trainerERA5_multistep_grad_accum.py
by @yingkaisha in #106 - Infrastructure for post-processing blocks (e.g. SKEBS, mass correction, laplacian filtering) by @dkimpara in #105
- fixing environment_[device].yml specs by @dkimpara in #108
- Major
credit.postblock
update + majorcredit.physics_core
update + major bugfix + minor code optimization and cleaning by @yingkaisha in #86 - Bugfix on the use of diagnostic variables in
trainerERA5_multistep_grad_accum.py
by @yingkaisha in #111 - Fix
NCLL_NET_GDR_LEVEL
environment variable incredit.pbs.py
by @kanz76 in #115 PHB
does not work. Revert toPBH
by @kanz76 in #118credit.postblock
major updates onGlobalMassFixer
andGlobalEnergyFixer
by @yingkaisha in https://github.com/NCAR/miles-credit/...