-
Notifications
You must be signed in to change notification settings - Fork 54
Description
Currently, anemoi-datasets doesn't recompute statistics when combining multiple datasets with concat (see https://anemoi.readthedocs.io/projects/datasets/en/latest/using/statistics.html#statistics and https://anemoi.readthedocs.io/projects/datasets/en/latest/using/combining.html#concat).
If possible, I would be interested in an option to recompute these statistics.
In particular, I had an issue that I'm associating with the behaviour of statistics.
I have several anemoi-datasets (subsets of CERRA) already on disk that I would like to assemble into a single dataset, with spans:
- 1985 - 1989
- 1990 - 1999
- 2000 - 2009
- 2010 - 2019
- 2020
- 2021
- 2022
- 2025
These datasets were generated from mars, using the following recipe (replacing XXXX and YYYY with start and end years):
description: |
Copernicus European Regional Reanalysis
name: cerra-rr-an-oper-se-al-ec-mars-5p5km-XXXX-YYYY-3h-v1
dates:
end: YYYY-12-31T18:00:00
frequency: 3h
start: XXXX-01-01T00:00:00
mars_common: &mars_common
class: rr
expver: prod
origin: se-al-ec
stream: oper
accum_base: &accum_base
<<: *mars_common
levtype: sfc
type: fc
param: ["tp","ssrd","strd"]
input:
join:
- mars:
<<: *mars_common
levtype: sfc
type: an
param: [10si, 10wdir, 2t, 2r, msl, sp, tcc, tciwv, sr, orog, lsm]
- mars:
# Maximum 10 metre wind gust since previous post-processing (10fg): 49
# Surface long-wave (thermal) radiation downwards (strd): 175
# Surface net long-wave (thermal) radiation (std): 177
# Surface net short-wave (solar) radiation (ssr): 176
# Surface short-wave (solar) radiation downwards (ssrd): 169
# Maximum temperature at 2 metres since previous post-processing (mx2t): 201
# Minimum temperature at 2 metres since previous post-processing (mn2t): 202
<<: *mars_common
levtype: sfc
type: fc
step: 3
param: [201, 202, 49]
- mars:
<<: *mars_common
levtype: hl
type: an
levelist: 100
param: [ws, wdir]
- constants:
template: ${input.join.0.mars}
param:
- cos_latitude
- cos_longitude
- sin_latitude
- sin_longitude
- cos_julian_day
- cos_local_time
- sin_julian_day
- sin_local_time
- insolation
# Precipitation
- concat:
# VALID TIME: 21Z - Forecast: 12Z - step (9 - 6)
- dates:
start: XXXX-01-01T21:00:00
end: YYYY-12-31T21:00:00
frequency: 24h
accumulations:
<<: *accum_base
time: [12]
accumulation_period: [6, 9]
# VALID TIME: 00Z - Forecast: 12Z previous day - step (12 - 9)
- dates:
start: XXXX-01-01T00:00:00
end: YYYY-12-31T00:00:00
frequency: 24h
accumulations:
<<: *accum_base
time: [12]
accumulation_period: [9, 12]
# VALID TIME: 03Z - Forecast: 12Z previous day - step (15 - 12)
- dates:
start: XXXX-01-01T03:00:00
end: YYYY-12-31T03:00:00
frequency: 24h
accumulations:
<<: *accum_base
time: [12]
accumulation_period: [12, 15]
# VALID TIME: 06Z - Forecast: 12Z previous day - step (18 - 15)
- dates:
start: XXXX-01-01T06:00:00
end: YYYY-12-31T06:00:00
frequency: 24h
accumulations:
<<: *accum_base
time: [12]
accumulation_period: [15, 18]
# VALID TIME: 09Z - Forecast: 00Z - step (9 - 6)
- dates:
start: XXXX-01-01T09:00:00
end: YYYY-12-31T09:00:00
frequency: 24h
accumulations:
<<: *accum_base
time: [0]
accumulation_period: [6, 9]
# VALID TIME: 12Z - Forecast: 00Z - step (12 - 9)
- dates:
start: XXXX-01-01T12:00:00
end: YYYY-12-31T12:00:00
frequency: 24h
accumulations:
<<: *accum_base
time: [0]
accumulation_period: [9, 12]
# VALID TIME: 15Z - Forecast: 00Z - step (15 - 12)
- dates:
start: XXXX-01-01T15:00:00
end: YYYY-12-31T15:00:00
frequency: 24h
accumulations:
<<: *accum_base
time: [0]
accumulation_period: [12, 15]
# VALID TIME: 18Z - Forecast: 00Z - step (18 - 15)
- dates:
start: XXXX-01-01T18:00:00
end: YYYY-12-31T18:00:00
frequency: 24h
accumulations:
<<: *accum_base
time: [0]
accumulation_period: [15, 18]These datasets have no missing dates.
But when combining them with concat, I'm having the following error when applying anemoi-datasets finalize:
--- Finalising dataset ---
2025-12-03 16:01:24 INFO 🎬 Task finalise((),{}) starting
Traceback (most recent call last):
File "anemoi-datasets", line 10, in <module>
sys.exit(main())
^^^^^^
File "anemoi/datasets/__main__.py", line 33, in main
cli_main(__version__, __doc__, COMMANDS)
File "anemoi/utils/cli.py", line 266, in cli_main
cmd.run(args)
File "anemoi/datasets/commands/finalise.py", line 59, in run
task(step, options)
File "anemoi/datasets/commands/create.py", line 53, in task
result = c.run()
^^^^^^^
File "anemoi/datasets/create/__init__.py", line 1579, in run
t.run()
File "anemoi/datasets/create/__init__.py", line 1519, in run
stats = self.tmp_statistics.get_aggregated(dates, variables, self.allow_nans)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "anemoi/datasets/create/statistics/__init__.py", line 397, in get_aggregated
aggregator = StatAggregator(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "anemoi/datasets/create/statistics/__init__.py", line 452, in __init__
self._read()
File "anemoi/datasets/create/statistics/__init__.py", line 507, in _read
assert d in found, f"Statistics for date {d} not precomputed."
^^^^^^^^^^
AssertionError: Statistics for date 1989-12-01T00:00:00 not precomputed.py", line 397, in get_aggregated
aggregator = StatAggregator(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "anemoi/datasets/create/statistics/__init__.py", line 452, in __init__
self._read()
File "anemoi/datasets/create/statistics/__init__.py", line 507, in _read
assert d in found, f"Statistics for date {d} not precomputed."
^^^^^^^^^^
AssertionError: Statistics for date 1989-12-01T00:00:00 not precomputed.
I'm interpreting this error as the anemoi attempting to aggregate the statistics, but being unable to due to statistics being pre-computed on each anemoi-dataset but not on the full date range .
Metadata
Metadata
Assignees
Labels
Type
Projects
Status