Nearest neighbor lookup using H3 at a 'fixed' resolution

I'm wondering if there could be more efficient alternatives to (K-D / Ball / R) trees for the cases where the (lat, lon) data points to be indexed are not strictly evenly spaced but where the distances between direct neighbors are still pretty much similar for the whole dataset. (Is it the case for NEMO and/or FASEOM2 model grids?)

There are some examples [here](https://nbviewer.jupyter.org/github/uber/h3-py-notebooks/blob/master/notebooks/urban_analytics.ipynb) and [here](https://stackoverflow.com/questions/57050286/how-to-find-the-locations-indices-whose-lat-long-co-ordinates-are-stored-in-geo) on performing spatial search using the [H3](https://h3geo.org/) library.

Here, the basic idea would be:

1. Choose a fixed H3 resolution `res`

- Here is the [table of all available resolutions](https://h3geo.org/docs/core-library/restable).
- This has to be chosen carefully, depending on the average "resolution" of the grid points to be indexed. It will impact both performance and memory consumption for storing the pre-computed index. 

2. Build the index:

- Compute the H3 index of each grid point at the given `res`. This is quite efficient and could be easily done in parallel using Dask (for `80_000_000` points it takes <10 seconds using all the cores on my Intel i7 laptop).
- Build a hash-table so that we can retrieve the original data points (positional index) from the computed H3 index values. Not sure at all about this part, though. Would this be efficient? The size of the table could be potentially huge. I guess numba's support for `dict` would be useful here? How could we leverage Dask for this?

3. Nearest neighbor query:

- Compute the H3 index of each query point at the given `res`.
- Retrieve all candidates using the hash-table computed above. We could iterate over neighboring H3 cells using [kRing](https://h3geo.org/docs/api/traversal) until at least one candidate is found (or until we reach a given tolerance). Unfortunately, there's not yet a vectorized implementation of `kRing` in h3's Python bindings.
- If multiple candidates are found, use a brute-force approach (or any other smarter approach) for selecting the nearest neighbor among those candidates.

Whether the query is efficient or not will depend of `res`. Ideally, there should be only a handful of candidates in the direct H3 cell vicinity (`kRing=1`) for each query point. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Nearest neighbor lookup using H3 at a 'fixed' resolution #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Nearest neighbor lookup using H3 at a 'fixed' resolution #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions