Adding script for benchmarking the curvilinear search function #5

erikvansebille · 2025-08-13T11:55:10Z

Meant to compare the spatial hashing (Parcels-code/Parcels#2132) against the old (vectorized) search

fluidnumerics-joe · 2025-08-15T19:06:24Z

Just posting a basline bulk runtime estimate on my system for this benchmark with parcels v4-dev at Parcels-code/Parcels@add8158

Runtime (41s)

(parcels) [joe@claymation parcels-benchmarks]$ python MOi-Curvilinear/benchmark_index_search.py 
Running 1 particles with parcels v4
Execution time: 41 seconds

System specs

lstopo

(parcels) [joe@claymation parcels-benchmarks]$ lstopo
Machine (15GB total)
  Package L#0
    NUMANode L#0 (P#0 15GB)
    L3 L#0 (12MB)
      L2 L#0 (1280KB) + L1d L#0 (48KB) + L1i L#0 (32KB) + Core L#0
        PU L#0 (P#0)
        PU L#1 (P#1)
      L2 L#1 (1280KB) + L1d L#1 (48KB) + L1i L#1 (32KB) + Core L#1
        PU L#2 (P#2)
        PU L#3 (P#3)
      L2 L#2 (2048KB)
        L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2 + PU L#4 (P#4)
        L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3 + PU L#5 (P#5)
        L1d L#4 (32KB) + L1i L#4 (64KB) + Core L#4 + PU L#6 (P#6)
        L1d L#5 (32KB) + L1i L#5 (64KB) + Core L#5 + PU L#7 (P#7)
      L2 L#3 (2048KB)
        L1d L#6 (32KB) + L1i L#6 (64KB) + Core L#6 + PU L#8 (P#8)
        L1d L#7 (32KB) + L1i L#7 (64KB) + Core L#7 + PU L#9 (P#9)
        L1d L#8 (32KB) + L1i L#8 (64KB) + Core L#8 + PU L#10 (P#10)
        L1d L#9 (32KB) + L1i L#9 (64KB) + Core L#9 + PU L#11 (P#11)
  HostBridge
    PCI 00:02.0 (VGA)
    PCIBridge
      PCI 01:00.0 (NVMExp)
        Block(Disk) "nvme0n1"
    PCI 00:14.3 (Network)
      Net "wlp0s20f3"
    PCIBridge
      PCI 2e:00.0 (NVMExp)
        Block(Disk) "nvme1n1"

OS details

$ cat /etc/os-release 
NAME="Rocky Linux"
VERSION="9.6 (Blue Onyx)"
ID="rocky"

tuned profile

parcels) [joe@claymation parcels-benchmarks]$ tuned-adm active
Current active profile: balanced

fluidnumerics-joe · 2025-08-15T19:28:58Z

Doubling the hash grid cell cell size (computed in parcels.spatialhash.SpatialHash._hash_cell_size, I can cut the runtime in half

(parcels) [joe@claymation parcels-benchmarks]$ python MOi-Curvilinear/benchmark_index_search.py
Running 1 particles with parcels v4
Execution time: 21 seconds

Nearly all the runtime savings here is coming from the initialization. This result makes sense as increasing the hash cell size reduces the inner loop size on the hash table initialization, giving fewer .append calls to the list of lists hash-table.

fluidnumerics-joe · 2025-08-15T20:19:01Z

Here's a few results, just playing with the hash grid cell size factor (the parameter that multiplies the median quad area), for the single particle on MOi benchmark. I've also split apart the timing for the hash grid initialization and the actual search time

Running 1 particles with parcels v4
  Spatial hash factor: 0.5
Spatial hash construction time: 40.19 seconds
Search time: 0.00 seconds

  Spatial hash factor: 1.0
Spatial hash construction time: 22.69 seconds
Search time: 0.00 seconds

  Spatial hash factor: 1.5
Spatial hash construction time: 16.79 seconds
Search time: 0.00 seconds

  Spatial hash factor: 2.0
Spatial hash construction time: 13.68 seconds
Search time: 0.00 seconds

  Spatial hash factor: 2.5
Spatial hash construction time: 12.34 seconds
Search time: 0.00 seconds

  Spatial hash factor: 3.0
Spatial hash construction time: 11.79 seconds
Search time: 0.00 seconds

  Spatial hash factor: 3.5
Spatial hash construction time: 11.43 seconds
Search time: 0.00 seconds

  Spatial hash factor: 4.0
Spatial hash construction time: 11.60 seconds
Search time: 0.01 seconds

@erikvansebille - tbh, I'm not sure how you're getting runtimes for multiple particles - it must not be on v4-dev. I'm crashing during the search as soon as I go to more than 1 particle in this benchmark

Traceback (most recent call last):
  File "/home/joe/Projects/Geomar-Utrecht/parcels-benchmarks/MOi-Curvilinear/benchmark_index_search.py", line 123, in <module>
    main()
  File "/home/joe/Projects/Geomar-Utrecht/parcels-benchmarks/MOi-Curvilinear/benchmark_index_search.py", line 119, in main
    run_benchmark(args.memory)
  File "/home/joe/Projects/Geomar-Utrecht/parcels-benchmarks/MOi-Curvilinear/benchmark_index_search.py", line 95, in run_benchmark
    fieldset.U.grid.search(0, lat, lon)
  File "/home/joe/miniconda3/envs/parcels/lib/python3.12/site-packages/parcels/xgrid.py", line 302, in search
    yi, eta, xi, xsi = _search_indices_curvilinear_2d(self, y, x, yi, xi)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/joe/miniconda3/envs/parcels/lib/python3.12/site-packages/parcels/_index_search.py", line 316, in _search_indices_curvilinear_2d
    if abs(aa) < 1e-12:  # Rectilinear cell, or quasi

Edit : my current work-around is to use the _search_indices_curvilinear_2d from your vectorized branch

erikvansebille · 2025-08-18T09:10:35Z

Thanks for this analysis, Joe!

@erikvansebille - tbh, I'm not sure how you're getting runtimes for multiple particles - it must not be on v4-dev. I'm crashing during the search as soon as I go to more than 1 particle in this benchmark

Indeed, I do all the simulations on the vectorized kernel branch from Parcels-code/Parcels#2122; I hope to merge that branch into v4-dev any day now; when @VeckoTheGecko has done a final review. So indeed best to already do all performance benchmarking under that branch

On Friday, I started work on the VectorField interpolation (Parcels-code/Parcels#2152) and found there that I got significant speedups when loading the curvilinear lon and lat arrays (Parcels-code/Parcels@2da436a) for the NemoCurvilinear_data test (Parcels-code/Parcels@1bd7e0c). Because curvilinear lon and lat are not xarray dimensions; they seem not to be loaded by default, which triggered a separate load on each call to xgrid.lon and xgrid.lat.
Is this something that would help speed up the curvilinear search in general?

fluidnumerics-joe · 2025-08-18T13:18:18Z

Construction Time v Hash Grid Scaling Factor

I've done a little bit of a sweep to see how changing the hash grid cell size influences construction and search time. In general, increasing the hash grid cell size (increasing the "hash grid factor") reduces the initialization cost but does have the effect that it can increase search times. For the search times, I still want to dive in to see how much time is spent in the spatial hash query and the rest of the search.

fluidnumerics-joe · 2025-08-18T13:19:28Z

On Friday, I started work on the VectorField interpolation (Parcels-code/Parcels#2152) and found there that I got significant speedups when loading the curvilinear lon and lat arrays (Parcels-code/Parcels@2da436a) for the NemoCurvilinear_data test (Parcels-code/Parcels@1bd7e0c). Because curvilinear lon and lat are not xarray dimensions; they seem not to be loaded by default, which triggered a separate load on each call to xgrid.lon and xgrid.lat.
Is this something that would help speed up the curvilinear search in general?

Potentially. I'll pull profiles this morning :)

fluidnumerics-joe · 2025-08-18T16:16:10Z

I looked at another potential implementation of the hash table initialization that reduced the inner loop operations and killed off append operations. This was done by moving to a compressed sparse row storage format for the hash table. Initial results are not too good in comparison to what we originally had. I'm still giving this some thought..

fluidnumerics-joe · 2025-08-18T21:11:11Z

I did also find a bug in how we treat wrap-arounds in the antimeridian. This is creating a few rows in the hash table that are quite long - this very well could be associated with long construction times and with long search times.. Working on a fix now.

fluidnumerics-joe · 2025-08-19T02:05:19Z

In working through resolving issues related to crossing the antimeridian, I've tracked down a few more problematic cells in the MOi dataset; there are some nearly colinear elements near the north pole that give us some additional headache. The bounding box and hash cell indices are given below.

[4803 4803] [10080 10080]
x : -16.45905303955078, 163.24029541015625
y : 89.95170593261719, 89.9555892944336

To avoid these kinds of issues, grids with the spherical type are going to have to be converted to cartesian coordinates (on a unit sphere).

erikvansebille · 2025-08-19T05:35:02Z

Thanks for digging deeper, Joe!

To avoid these kinds of issues, grids with the spherical type are going to have to be converted to cartesian coordinates (on a unit sphere).

What do you mean with that? I don't think I understand. Anything I can help with?

erikvansebille · 2025-08-19T07:24:51Z

I've drafted a PR with a (failing) unit test at Parcels-code/Parcels#2153 that you could use to debug the spatial hash function. It uses the NEMO_Curvilinear dataset, which is a bit smaller than the MOi one used here; but has the same issue that there is an antemeridian in the domain, see plot below

Does this help you with debugging?

erikvansebille · 2025-08-25T08:12:06Z

I just did the full benchmarking for the Morton encoding in Parcels-code/Parcels#2158. It really is very impressive speedup compared to the spatial hashing that you introduced in Parcels-code/Parcels#2132, @fluidnumerics-joe!

See below the runtime

On my MacBook, the runtime is < 1 seconds up to 500k particles, and the initiation is only 1 second too. For 1M particles, the runtime is 2 seconds, which is a factor 300(!) faster than the 604 seconds for the spatial hash.

The performance worsens for the 5M particle case (which takes 668 seconds), probably because of memory issues. See below the peak memory uses. The memory for 5M particles is >6GB, which may have lead to some disk/swapping on my laptop.

Note also that the memory load for the Morton encoding is in general 5 times higher than for the spatial hashing. Any initial idea why the memory footprint is so much larger, @fluidnumerics-joe? Reducing that is perhaps something to explore further down the line.

erikvansebille · 2025-09-09T09:22:38Z

Here an update of the search_indices timings with the new Morton implementation in Parcels-code/Parcels#2177 (cyan lines)

The initialisation is a bit slower (6 seconds on my MacBook), but the factor 10 speed-up for 5M particles is very nice (from 11 minutes to 50 seconds)

The peak memory use is a lot higher again, though...

fluidnumerics-joe · 2025-09-09T10:23:48Z

I'll see if there's room to trim back on memory usage. There's a few spots where uint32 would suffice that uint64 is being used. This is kinda the price we pay with vectorization though, it seems. Getting rid of those for loops required masks and quick lookup tables. But perhaps they're may be room for reuse? I'll take another pass

Adding script for benchmarking the curvilinear search function

4faf6cd

Meant to compare the spatial hashing (Parcels-code/Parcels#2132) against the old (vectorized) search

fluidnumerics-joe mentioned this pull request Aug 13, 2025

Spatial hashing and curvilinear search performance improvements Parcels-code/Parcels#2144

Open

erikvansebille mentioned this pull request Aug 19, 2025

Testing and fixing NEMO spatialhash index_search Parcels-code/Parcels#2153

Merged

3 tasks

erikvansebille mentioned this pull request Sep 9, 2025

Further cleaning the handling of grid searching errors Parcels-code/Parcels#2177

Merged

4 tasks

Running two grid searches to also check particle.ei reuse

9a39630

Adding script for benchmarking the curvilinear search function #5

Are you sure you want to change the base?

Adding script for benchmarking the curvilinear search function #5

Uh oh!

Conversation

erikvansebille commented Aug 13, 2025

Uh oh!

fluidnumerics-joe commented Aug 15, 2025

Runtime (41s)

System specs

Uh oh!

fluidnumerics-joe commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fluidnumerics-joe commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erikvansebille commented Aug 18, 2025

Uh oh!

fluidnumerics-joe commented Aug 18, 2025

Uh oh!

fluidnumerics-joe commented Aug 18, 2025

Uh oh!

fluidnumerics-joe commented Aug 18, 2025

Uh oh!

fluidnumerics-joe commented Aug 18, 2025

Uh oh!

fluidnumerics-joe commented Aug 19, 2025

Uh oh!

erikvansebille commented Aug 19, 2025

Uh oh!

erikvansebille commented Aug 19, 2025

Uh oh!

erikvansebille commented Aug 25, 2025

Uh oh!

erikvansebille commented Sep 9, 2025

Uh oh!

fluidnumerics-joe commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fluidnumerics-joe commented Aug 15, 2025 •

edited

Loading

fluidnumerics-joe commented Aug 15, 2025 •

edited

Loading