-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Description
I am trying to build a xarray-zarr cloud-based dataset with ERA5 for training a 1° model and am finding that regridding step of the creation is extremely slow. My understanding is that the anemoi dataset .zarr lives in the cloud and so anemoi-dataset just wraps the steps of how to handle the data as it is loaded. My question is, is this the correct approach and if so is there a way to speed this up, i.e., by using GPUs since regridding is matrix multiplications? Thank you!
As a MWE the regridding step of the following is expected to take 3 hours to process a single month of data:
anemoi-datasets create recipe.yaml gcp_era5.zarrwhere recipe.yaml is the following anemoi-dataset configuration:
dates:
start: 2020-01-01T00:00
end: 2020-01-31T23:00
frequency: 6h
input:
join:
- pipe:
- xarray-zarr:
url: gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-6h-0p25deg_derived.zarr/
param:
- 2m_temperature
- rename:
param:
2m_temperature: 2t
- regrid:
method: linear
in_grid: [0.25, 0.25]
out_grid: O96
- pipe:
- xarray-zarr:
url: gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-6h-0p25deg_derived.zarr/
param:
- temperature
- rename:
param:
temperature: t
level:
- 1000
- 850
- 500
- regrid:
method: linear
in_grid: [0.25, 0.25]
out_grid: O96
- forcings:
template: ${input.join.0.pipe}
param:
- cos_latitude
- cos_longitude
- sin_latitude
- sin_longitude
- cos_julian_day
- cos_local_time
- sin_julian_day
- sin_local_time
- insolationMetadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
To be triaged