Skip to content

Commit 81f1993

Browse files
author
Weatherbench2 authors
committed
Merge pull request #90 from google-research:first-update
PiperOrigin-RevId: 587019563
2 parents 14be14f + c5fc28f commit 81f1993

File tree

5 files changed

+24670
-15026
lines changed

5 files changed

+24670
-15026
lines changed

docs/source/api.md

+24
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121
metrics.Bias
2222
metrics.SpatialBias
2323
metrics.ACC
24+
metrics.SEEPS
25+
metrics.SpatialSEEPS
2426
```
2527

2628
### Probabilistic Metrics
@@ -30,13 +32,22 @@
3032
3133
metrics.EnsembleMetric
3234
metrics.CRPS
35+
metrics.SpatialCRPS
3336
metrics.CRPSSpread
37+
metrics.SpatialCRPSSpread
3438
metrics.CRPSSkill
39+
metrics.SpatialCRPSSkill
3540
metrics.EnsembleStddev
41+
metrics.EnsembleVariance
42+
metrics.SpatialEnsembleVariance
3643
metrics.EnsembleMeanRMSE
44+
metrics.EnsembleMeanMSE
45+
metrics.SpatialEnsembleMeanMSE
3746
metrics.EnergyScore
3847
metrics.EnergyScoreSpread
3948
metrics.EnergyScoreSkill
49+
metrics.RankHistogram
50+
metrics.GaussianCRPS
4051
```
4152

4253
## Config
@@ -63,6 +74,7 @@
6374
regions.SliceRegion
6475
regions.ExtraTropicalRegion
6576
regions.LandRegion
77+
regions.CombinedRegion
6678
```
6779

6880
## Derived Variables
@@ -73,6 +85,18 @@
7385
7486
derived_variables.DerivedVariable
7587
derived_variables.WindSpeed
88+
derived_variables.WindDivergence
89+
derived_variables.WindVorticity
90+
derived_variables.VerticalVelocity
91+
derived_variables.EddyKineticEnergy
92+
derived_variables.GeostrophicWindSpeed
93+
derived_variables.AgeostrophicWindSpeed
94+
derived_variables.UComponentOfAgeostrophicWind
95+
derived_variables.VComponentOfAgeostrophicWind
96+
derived_variables.LapseRate
97+
derived_variables.TotalColumnWater
98+
derived_variables.IntegratedWaterTransport
99+
derived_variables.RelativeHumidity
76100
derived_variables.PrecipitationAccumulation
77101
derived_variables.AggregatePrecipitationAccumulation
78102
derived_variables.ZonalEnergySpectrum

docs/source/command-line-scripts.md

+103-59
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,9 @@ usage: evaluate.py [-h]
1919
[--probabilistic_climatology_start_year PROBABILISTIC_CLIMATOLOGY_START_YEAR]
2020
[--probabilistic_climatology_end_year PROBABILISTIC_CLIMATOLOGY_END_YEAR]
2121
[--probabilistic_climatology_hour_interval PROBABILISTIC_CLIMATOLOGY_HOUR_INTERVAL]
22-
[--add_land_region]
22+
[--regions REGIONS]
23+
[--lsm_dataset LSM_DATASET]
24+
[--compute_seeps]
2325
[--eval_configs EVAL_CONFIGS]
2426
[--ensemble_dim ENSEMBLE_DIM]
2527
[--rename_variables RENAME_VARIABLES]
@@ -53,7 +55,9 @@ _Command options_:
5355
for probabilistic climatology
5456
* `--probabilistic_climatology_hour_interval`: Hour interval to compute
5557
probabilistic climatology. Default: 6
56-
* `--add_land_region`: Add land-only evaluation. `land_sea_mask` must be in observation dataset.
58+
* `--regions`: Comma delimited list of predefined regions to evaluate. "all" for all predefined regions.
59+
* `--lsm_dataset`: Dataset containing land-sea-mask at same resolution of datasets to be evaluated. Required if region with land-sea-mask is picked. If None, defaults to observation dataset.
60+
* `--compute_seeps`: Compute SEEPS for total_precipitation.
5761
* `--eval_configs`: Comma-separated list of evaluation configs to run. See details below. Default: `deterministic`
5862
* `--ensemble_dim`: Ensemble dimension name for ensemble metrics. Default: `number`.
5963
* `--rename_variables`: Dictionary of variable to rename to standard names. E.g. {"2t": "2m_temperature"}
@@ -70,59 +74,8 @@ _Command options_:
7074
* `--beam_runner`: Beam runner
7175
* `--fanout`: Beam CombineFn fanout. Might be required for large dataset. Default: `None`
7276

73-
*Predefined evaluation configs*
77+
[Predefined evaluation configs](https://github.com/google-research/weatherbench2/blob/main/scripts/evaluate.py#L389)
7478

75-
```
76-
deterministic_metrics = {
77-
'rmse': RMSE(wind_vector_rmse=_wind_vector_rmse()),
78-
'mse': MSE(),
79-
'acc': ACC(climatology=climatology),
80-
}
81-
82-
eval_configs = {
83-
'deterministic': config.Eval(
84-
metrics=deterministic_metrics,
85-
against_analysis=False,
86-
regions=regions,
87-
derived_variables=derived_variables,
88-
evaluate_persistence=EVALUATE_PERSISTENCE.value,
89-
evaluate_climatology=EVALUATE_CLIMATOLOGY.value,
90-
),
91-
'deterministic_spatial': config.Eval(
92-
metrics={'bias': SpatialBias(), 'mse': SpatialMSE()},
93-
against_analysis=False,
94-
derived_variables=derived_variables,
95-
evaluate_persistence=EVALUATE_PERSISTENCE.value,
96-
evaluate_climatology=EVALUATE_CLIMATOLOGY.value,
97-
),
98-
'deterministic_temporal': config.Eval(
99-
metrics=deterministic_metrics,
100-
against_analysis=False,
101-
regions=regions,
102-
derived_variables=derived_variables,
103-
evaluate_persistence=EVALUATE_PERSISTENCE.value,
104-
evaluate_climatology=EVALUATE_CLIMATOLOGY.value,
105-
temporal_mean=False,
106-
),
107-
'probabilistic': config.Eval(
108-
metrics={
109-
'crps': CRPS(ensemble_dim=ENSEMBLE_DIM.value),
110-
'ensemble_mean_rmse': EnsembleMeanRMSE(
111-
ensemble_dim=ENSEMBLE_DIM.value
112-
),
113-
'ensemble_stddev': EnsembleStddev(
114-
ensemble_dim=ENSEMBLE_DIM.value
115-
),
116-
},
117-
against_analysis=False,
118-
derived_variables=derived_variables,
119-
evaluate_probabilistic_climatology=EVALUATE_PROBABILISTIC_CLIMATOLOGY.value,
120-
probabilistic_climatology_start_year=PROBABILISTIC_CLIMATOLOGY_START_YEAR.value,
121-
probabilistic_climatology_end_year=PROBABILISTIC_CLIMATOLOGY_END_YEAR.value,
122-
probabilistic_climatology_hour_interval=PROBABILISTIC_CLIMATOLOGY_HOUR_INTERVAL.value,
123-
),
124-
}
125-
```
12679

12780
*Example*
12881

@@ -149,6 +102,7 @@ This scripts computes a day-of-year, hour-of-day climatology with optional smoot
149102
usage: compute_climatology.py [-h]
150103
[--input_path INPUT_PATH]
151104
[--output_path OUTPUT_PATH]
105+
[--frequency FREQUENCY]
152106
[--hour_interval HOUR_INTERVAL]
153107
[--window_size WINDOW_SIZE]
154108
[--start_year START_YEAR]
@@ -160,14 +114,15 @@ usage: compute_climatology.py [-h]
160114
[--add_statistic_suffix]
161115
[--method METHOD]
162116
[--seeps_dry_threshold_mm SEEPS_DRY_THRESHOLD_MM]
163-
[--beam_runner BEAM_RUNNER]
117+
[--runner RUNNER]
164118
165119
```
166120

167121
_Command options_:
168122

169123
* `--input_path`: (required) Input Zarr path
170124
* `--output_path`: (required) Output Zarr path
125+
* `--frequency`: Frequency of the computed climatology. "hourly": Compute the climatology per day of year and hour of day. "daily": Compute the climatology per day of year.
171126
* `--hour_interval`: Which intervals to compute hourly climatology for. Default: `1`
172127
* `--window_size`: Window size in days to average over. Default: `61`
173128
* `--start_year`: Inclusive start year of climatology. Default: `1990`
@@ -179,7 +134,7 @@ _Command options_:
179134
* `--add_statistic_suffix`: Add suffix of statistic to variable name. Required for >1 statistic.
180135
* `--method`: Computation method to use. "explicit": Stack years first, apply rolling and then compute weighted statistic over (year, rolling_window). "fast": Compute statistic over day-of-year first and then apply weighted smoothing. Mathematically equivalent for mean but different for nonlinear statistics. Default: `explicit`
181136
* `--seeps_dry_threshold_mm`: Dict defining dry threshold for SEEPS quantile computation for each precipitation variable. In mm. Default: `"{'total_precipitation_24hr':0.25, 'total_precipitation_6hr':0.1}"`
182-
* `--beam_runner`: Beam runner. Use `DirectRunner` for local execution.
137+
* `--runner`: Beam runner. Use `DirectRunner` for local execution.
183138

184139
*Example*
185140

@@ -203,9 +158,11 @@ Computes derived variables, adds them to the original dataset and saves it as a
203158
usage: compute_derived_variables.py [-h]
204159
[--input_path INPUT_PATH]
205160
[--output_path OUTPUT_PATH]
206-
[--derived_variables DERIVED_VARIABLES]
161+
[--derived_variables DERIVED_VARIABLES]
162+
[--preexisting_variables_to_remove PREEXISTING_VARIABLES_TO_REMOVE]
207163
[--raw_tp_name RAW_TP_NAME]
208164
[--rename_raw_tp_name]
165+
[--rename_variables RENAME_VARIABLES]
209166
[--working_chunks WORKING_CHUNKS]
210167
[--rechunk_itemsize RECHUNK_ITEMSIZE]
211168
[--max_mem_gb MAX_MEM_GB]
@@ -216,9 +173,11 @@ _Command options_:
216173

217174
* `--input_path`: (required) Input Zarr path
218175
* `--output_path`: (required) Output Zarr path
219-
* `--derived_variables`: (required) Comma delimited list of derived variables to compute. Default: `wind_speed,10m_wind_speed,total_precipitation_6hr,total_precipitation_24hr`
176+
* `--derived_variables`: Comma delimited list of derived variables to compute. By default, tries to compute all derived variables.
177+
* `--preexisting_variables_to_remove`: Comma delimited list of variables to remove from the source data, if they exist. This is useful to allow for overriding source dataset variables with derived variables of the same name.
220178
* `--raw_tp_name`: Raw name of total precipitation variables. Use "total_precipitation_6hr" for backwards compatibility.
221179
* `--rename_raw_tp_name`: Rename raw tp name to "total_precipitation".
180+
* `--rename_variables`: Dictionary of variable to rename to standard names. E.g. {"2t":"2m_temperature"}
222181
* `--working_chunks`: Chunk sizes overriding input chunks to use for computing aggregations e.g., "longitude=10,latitude=10". No need to add prediction_timedelta=-1, this is automatically added for aggregation variables. Default: `None`, i.e. input chunks
223182
* `--rechunk_itemsize`: Itemsize for rechunking. Default: `4`
224183
* `--max_mem_gb`: Max memory for rechunking in GB. Default: `1`
@@ -250,6 +209,7 @@ usage: compute_zonal_energy_spectrum.py [-h]
250209
[--time_stop TIME_STOP]
251210
[--levels LEVELS]
252211
[--averaging_dims AVERAGING_DIMS]
212+
[--fanout FANOUT]
253213
[--runner RUNNER]
254214
```
255215

@@ -263,12 +223,13 @@ _Command options_:
263223
* `--time_stop`: ISO 8601 timestamp (inclusive) at which to stop evaluation. Default: `2020-12-31`
264224
* `--levels`: Comma delimited list of pressure levels to compute spectra on. If empty, compute on all levels of --input_path. Default: `500,700,850`
265225
* `--averaging_dims`: Comma delimited list of variables to average over. If empty, do not average. Default: `time`
226+
* `--fanout`: Beam CombineFn fanout. Might be required for large dataset.
266227
* `--runner`: Beam runner. Use `DirectRunner` for local execution.
267228

268229
*Example*
269230

270231
```bash
271-
python compute_zonal_power_spectrum.py \
232+
python compute_zonal_energy_spectrum.py \
272233
--input_path=gs://weatherbench2/datasets/era5/1959-2022-6h-240x121_equiangular_with_poles_conservative.zarr \
273234
--output_path=PATH \
274235
--time_start=2020 \
@@ -284,6 +245,9 @@ To use the ensemble mean in deterministic evaluation, we first must compute the
284245
usage: compute_ensemble_mean.py [-h]
285246
[--input_path INPUT_PATH]
286247
[--output_path OUTPUT_PATH]
248+
[--time_dim TIME_DIM]
249+
[--time_start TIME_START]
250+
[--time_stop TIME_STOP]
287251
[--realization_name REALIZATION_NAME]
288252
[--runner RUNNER]
289253
```
@@ -292,6 +256,9 @@ _Command options_:
292256

293257
* `--input_path`: (required) Input Zarr path
294258
* `--output_path`: (required) Output Zarr path
259+
* `--time_dim`: Name for the time dimension to slice data on. Default: `time`
260+
* `--time_start`: ISO 8601 timestamp (inclusive) at which to start evaluation. Default: `2020-01-01'`
261+
* `--time_stop`: ISO 8601 timestamp (inclusive) at which to stop evaluation. Default: `2020-12-31`
295262
* `--realization_name`: Name of realization/member/number dimension. Default: `realization`
296263
* `--runner`: Beam runner. Use `DirectRunner` for local execution.
297264

@@ -384,6 +351,83 @@ python regrid.py \
384351
--regridding_method=conservative
385352
```
386353

354+
(compute_averages)=
355+
## Compute averages
356+
Computes average over dimensions of a forecast dataset.
357+
358+
```
359+
usage: compute_averages.py [-h]
360+
[--input_path INPUT_PATH]
361+
[--output_path OUTPUT_PATH]
362+
[--output_chunks OUTPUT_CHUNKS]
363+
[--time_dim TIME_DIM]
364+
[--time_start TIME_START]
365+
[--time_stop TIME_STOP]
366+
[--variables VARIABLES]
367+
[--fanout FANOUT]
368+
[--runner RUNNER]
369+
```
370+
371+
_Command options_:
372+
373+
* `--input_path`: (required) Input Zarr path
374+
* `--output_path`: (required) Output Zarr path
375+
* `--time_dim`: Name for the time dimension to slice data on. Default: `time`
376+
* `--time_start`: ISO 8601 timestamp (inclusive) at which to start evaluation. Default: `2020-01-01'`
377+
* `--time_stop`: ISO 8601 timestamp (inclusive) at which to stop evaluation. Default: `2020-12-31`
378+
* `--variables`: Comma delimited list of data variables to include in output. If empty, compute on all data_vars of --input_path.
379+
* `--fanout`: Beam CombineFn fanout. Might be required for large dataset.
380+
* `--runner`: Beam runner. Use `DirectRunner` for local execution.
381+
382+
*Example*
383+
384+
```bash
385+
python compute_averages.py \
386+
--input_path=gs://weatherbench2/datasets/era5/1959-2022-6h-64x32_equiangular_with_poles_conservative.zarr \
387+
--output_path=gs://$BUCKET/datasets/era5/$USER/temperature-vertical-profile.zarr \
388+
--runner=DataflowRunner \
389+
-- \
390+
--project=$PROJECT \
391+
--averaging_dims=time,longitude \
392+
--variables=temperature \
393+
--temp_location=gs://$BUCKET/tmp/ \
394+
--setup_file=./setup.py \
395+
--requirements_file=./scripts/dataflow-requirements.txt \
396+
--job_name=compute-vertical-profile-$USER
397+
```
398+
399+
(resample_daily)=
400+
## Resample daily
401+
Computes average over dimensions of a forecast dataset.
402+
403+
```
404+
usage: resample_daily.py [-h]
405+
[--input_path INPUT_PATH]
406+
[--output_path OUTPUT_PATH]
407+
[--method METHOD]
408+
[--period PERIOD]
409+
[--statistics STATISTICS]
410+
[--add_statistic_suffix]
411+
[--num_threads NUM_THREADS]
412+
[--start_year START_YEAR]
413+
[--end_year END_YEAR]
414+
[--working_chunks WORKING_CHUNKS]
415+
[--beam_runner BEAM_RUNNER]
416+
```
417+
418+
_Command options_:
419+
420+
* `--input_path`: (required) Input Zarr path
421+
* `--output_path`: (required) Output Zarr path
422+
* `--method`: resample or roll
423+
* `--period`: int + d or w
424+
* `--statistics`: Output resampled time statistics, from "mean", "min", or "max".
425+
* `--add_statistic_suffix`: Add suffix of statistic to variable name. Required for >1 statistic.
426+
* `--num_threads`: Number of chunks to load in parallel per worker.
427+
* `--start_year`: Start year (inclusive).
428+
* `--end_year`: End year (inclusive).
429+
* `--working_chunks`: Spatial chunk sizes to use during time downsampling, e.g., "longitude=10,latitude=10". They may not include "time".
430+
* `--beam_runner`: Beam runner. Use `DirectRunner` for local execution.
387431

388432
## Expand climatology
389433

0 commit comments

Comments
 (0)