Skip to content

Validate API results for grid cell area weighted zonal statistics #251

Closed
@j08lue

Description

@j08lue

Context

The VEDA data services APIs include a service to calculate zonal statistics (e.g. average, median, standard deviation, min, max) over (subsets of) a raster / gridded dataset.

Some of these datasets span large geographic areas and have projections where data points correspond to varying area / volume in their native projection. A common case is global datasets of climate variables on a regular lat/lon grid.

Some of the statistics we calculate are sensitive to the represented area - e.g. simply average data points despite their varying grid cell / pixel area will give a result that over-represents the data points with small area, in the case of lat/lon grids those at higher latitudes. For example, for a field of global sea surface temperatures that are lower towards the poles, the plain, unweighted average would be lower than the accurate, weighted one.

Method

To mitigate this inaccuracy, we implemented a method to reproject the source data to an equal area projection before calculating the statistics. For a lat/lon grid, we could also have calculated the grid cell / pixel area weights instead, since the formula is simple. However, since the API handles data in an arbitrary source projection, we chose the reprojection approach that should work for any source projection.

In the API code, what is going on is basically this: #209 (comment). I am happy to give more hints at where to find the exact code, if need be.

The ask

We need to test the robustness of the API implementation and build confidence that it is ready for a production release

  1. What are standard cases that we expect (e.g. continental-scale averages over a global lat/lon dataset?) - do we get accurate results for these?
  2. What are edge cases where it could fail? - how does the method perform?

Approach

What ever works - we have a live API for testing that has access to all the datasets in the GHG Center and VEDA STAC catalogs.

The results should be documented in an executable Jupyter Notebook, for future reference.

A notebook that runs a validation for a standard case is already in the VEDA docs and can be downloaded from the docs repo. You can launch this notebook directly in the GHG Center JupyterLab with this link: https://hub.ghg.center/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FNASA-IMPACT%2Fveda-docs%2F&urlpath=lab%2Ftree%2F%2Fnotebooks%2Ftutorials%2Fzonal-statistics-validation.ipynb&branch=main

Acceptance criteria

  • One or more independent, qualified specialists validated the method and shared their findings

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions