Validate API results for grid cell area weighted zonal statistics

## Context

The VEDA data services APIs include a service to calculate zonal statistics (e.g. average, median, standard deviation, min, max) over (subsets of) a raster / gridded dataset.

Some of these datasets span large geographic areas and have projections where data points correspond to varying area / volume in their native projection. A common case is global datasets of climate variables on a regular lat/lon grid.

Some of the statistics we calculate are sensitive to the represented area - e.g. simply average data points despite their varying grid cell / pixel area will give a result that over-represents the data points with small area, in the case of lat/lon grids those at higher latitudes. For example, for a field of global sea surface temperatures that are lower towards the poles, the plain, unweighted average would be lower than the accurate, weighted one.


## Method

To mitigate this inaccuracy, we implemented a method to reproject the source data to an equal area projection before calculating the statistics. For a lat/lon grid, we could also have calculated the grid cell / pixel area weights instead, since the formula is simple. However, since the API handles data in an arbitrary source projection, we chose the reprojection approach that should work for any source projection.

In the API code, what is going on is basically this: https://github.com/NASA-IMPACT/veda-config-ghg/issues/209#issuecomment-1790482261. I am happy to give more hints at where to find the exact code, if need be.


## The ask

We need to test the robustness of the API implementation and build confidence that it is ready for a production release

1. What are standard cases that we expect (e.g. continental-scale averages over a global lat/lon dataset?) - do we get accurate results for these?
2. What are edge cases where it could fail? - how does the method perform?


### Approach

What ever works - we have a live API for testing that has access to all the datasets in the GHG Center and VEDA STAC catalogs.

The results should be documented in an executable Jupyter Notebook, for future reference.

A notebook that runs a validation for a standard case is already [in the VEDA docs](https://nasa-impact.github.io/veda-docs/notebooks/tutorials/zonal-statistics-validation.html) and can be downloaded from the [docs repo](https://github.com/NASA-IMPACT/veda-docs/blob/main/notebooks/tutorials/zonal-statistics-validation.ipynb). You can launch this notebook directly in the GHG Center JupyterLab with this link: https://hub.ghg.center/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FNASA-IMPACT%2Fveda-docs%2F&urlpath=lab%2Ftree%2F%2Fnotebooks%2Ftutorials%2Fzonal-statistics-validation.ipynb&branch=main


## Acceptance criteria

- [ ] One or more independent, qualified specialists validated the method and shared their findings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Validate API results for grid cell area weighted zonal statistics #251

Context

Method

The ask

Approach

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Validate API results for grid cell area weighted zonal statistics #251

Description

Context

Method

The ask

Approach

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions