Skip to content

anemoi-dataset creation with regridding extremely slow #515

@Julians42

Description

@Julians42

I am trying to build a xarray-zarr cloud-based dataset with ERA5 for training a 1° model and am finding that regridding step of the creation is extremely slow. My understanding is that the anemoi dataset .zarr lives in the cloud and so anemoi-dataset just wraps the steps of how to handle the data as it is loaded. My question is, is this the correct approach and if so is there a way to speed this up, i.e., by using GPUs since regridding is matrix multiplications? Thank you!

As a MWE the regridding step of the following is expected to take 3 hours to process a single month of data:

anemoi-datasets create recipe.yaml gcp_era5.zarr

where recipe.yaml is the following anemoi-dataset configuration:

dates:
  start: 2020-01-01T00:00
  end: 2020-01-31T23:00
  frequency: 6h

input:
  join:
    - pipe:
      - xarray-zarr:
          url: gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-6h-0p25deg_derived.zarr/
          param:
            - 2m_temperature
      - rename:
          param:
            2m_temperature: 2t
      - regrid:
          method: linear
          in_grid: [0.25, 0.25]
          out_grid: O96
    - pipe:
      - xarray-zarr:
          url: gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-6h-0p25deg_derived.zarr/
          param:
            - temperature
      - rename:
          param:
            temperature: t
            level:
              - 1000
              - 850
              - 500
      - regrid:
          method: linear
          in_grid: [0.25, 0.25]
          out_grid: O96

    - forcings:
        template: ${input.join.0.pipe}
        param:
          - cos_latitude
          - cos_longitude
          - sin_latitude
          - sin_longitude
          - cos_julian_day
          - cos_local_time
          - sin_julian_day
          - sin_local_time
          - insolation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    To be triaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions