anemoi-dataset creation with regridding extremely slow

I am trying to build a xarray-zarr cloud-based dataset with ERA5 for training a 1° model and am finding that regridding step of the creation is extremely slow. My understanding is that the anemoi dataset `.zarr` lives in the cloud and so anemoi-dataset just wraps the steps of how to handle the data as it is loaded. My question is, is this the correct approach and if so is there a way to speed this up, i.e., by using GPUs since regridding is matrix multiplications? Thank you!

As a MWE the regridding step of the following is expected to take 3 hours to process a single month of data: 
```bash
anemoi-datasets create recipe.yaml gcp_era5.zarr
```
where `recipe.yaml` is the following anemoi-dataset configuration: 
```yaml
dates:
  start: 2020-01-01T00:00
  end: 2020-01-31T23:00
  frequency: 6h

input:
  join:
    - pipe:
      - xarray-zarr:
          url: gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-6h-0p25deg_derived.zarr/
          param:
            - 2m_temperature
      - rename:
          param:
            2m_temperature: 2t
      - regrid:
          method: linear
          in_grid: [0.25, 0.25]
          out_grid: O96
    - pipe:
      - xarray-zarr:
          url: gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-6h-0p25deg_derived.zarr/
          param:
            - temperature
      - rename:
          param:
            temperature: t
            level:
              - 1000
              - 850
              - 500
      - regrid:
          method: linear
          in_grid: [0.25, 0.25]
          out_grid: O96

    - forcings:
        template: ${input.join.0.pipe}
        param:
          - cos_latitude
          - cos_longitude
          - sin_latitude
          - sin_longitude
          - cos_julian_day
          - cos_local_time
          - sin_julian_day
          - sin_local_time
          - insolation
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anemoi-dataset creation with regridding extremely slow #515

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

anemoi-dataset creation with regridding extremely slow #515

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions