Skip to content

feature: Using an xarray coordinate dimension as string for station_id, timestamps for time #2

Open
@jmp75

Description

@jmp75

Is your feature request related to a problem? Please describe.

Loading an STF netcdf file from disk via xarray, as is and without further processing, leads to the following in memory dataset:

image

These days, such a dataset would arguably be designed using strings wherever possible for coordinates. In particular slicing through data to retrieve data for a given station can be done with the statement rain.sel( station_id = "123456A") .

Trying to do a similar selection on the data set as loaded, rain.sel(station_id=123456) would lead to KeyError: "'station_id' is not a valid dimension or coordinate

A similar observation can be done for the time dimension needing to use date/time representation, perhaps even more so than for station_id

Note also that having coordinates that are starting at one (i.e. Fortran indexing) rather than zero (C/Python) is not inconsequential in practice; this is notoriously fertile ground for "off by one" bugs when slicing data for use in modelling.

Describe the solution you'd like

The In-memory representation of STF via xarray has the following coordinates

  • "time" (pandas.Timestamp or np.datetime64)
  • station_id (str)
  • ens_member (int) (note: probably can only be int in python anyway, likely 64 bits. int32 retained on-disk)
  • lead_time (int)

Additional context

Possible downsides:

  • What is the performance downside, if any noticeable, when converting the in-memory data for writing to disk with "to_netcdf"
  • This would likely make using xarray "lazy-loading" capabilities not possible anymore. Does it really matter "these days"?

Other considerations:

  • "RAM windowed" time series for very large data not fitting in RAM. xarray backed by netcdf anyway does not offer native capabilities for this so far as I know. One would need to use zarr as an xarray backend, or perhaps other options (time series DBs?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions