Adding benchmark test for CopernicusMarine with different Interpolators #2

erikvansebille · 2025-07-21T15:16:13Z

This PR adds a benchmark test for the CopernicusMarineService data (specifically the SMOC product). The aim is to explore the scaling of the interpolation / field-evaluation loop.

The script in this PR was run with either Parcels-main (for the v3-benchmarking) or with Parcels-code/Parcels#2098 (for the v4-benchmarking

The good news is that for 1 particle, v4 is faster than v3. However, the (more important) not-so-good-news is that the scaling for v4 is much worse than v3. See figure below

For these small amounts of particles (≤50), the timing for v3 is constant and independent of JIT (blue) or Scipy (red) mode at approximately 2 minutes; suggesting that (almost) all compute time goes into reading the FieldSet

For the current v4, however, the runtime depends strongly on the type of interpolation methods.

Using the PureXarrayInterp method defined in this PR (simply return field.data.interp(time=t, lon=x, lat=y).values[0]) gives the slowest runtime for npart=50 (more than 30 minutes; yellow line)
Using the explicit BiRectiLinear method defined in this PR approximately halves the runtime (green line)
However, using the NoFieldAccess method defined in this PR (simply return 0 without any xarray access to the field data; but with index_searching) is blazingly fast (only 2 seconds for npart=50, the hardly-visible orange line at the bottom of the figure)

So, this suggests that the bottleneck is the access to field.data in the Interpolators. It's likely that our access pattern (one particle at a time) is just not efficient now. We'll need to consider how to make this access pattern smarter...

…erval is hourly) and more particles

erikvansebille · 2025-07-22T16:12:07Z

Here's some further benchmarking for the CopernicsMarine test, which shows really good results. All of these are with the 'pragmatic computeTimeChunk' that I implemented in Parcels-code/Parcels#2098

The JIT-v3 (blue line) is still independent of npart, but v3-Scipy (red) can now be seen to increase with npart

I could not run the original v4-dev for these large values of npart, the figure above shows that this would have taken hours/days...

However, with implementing a very simple 'computeTimeChunk` method (i.e. calling a field.data.compute on the two time slices needed, see Parcels-code/Parcels#2098 for implementation), we do get a reasonable scaling (yellow).

Interestingly, the performance improves much (and coincidentally(?) is exactly the same as v3-Scipy) when we also include the 'time-as-float' change (green, Parcels-code/Parcels#2090)

For reference, I also plotted the performance for no compute at all (so no calls to the field data), that is the dashed orange line

erikvansebille · 2025-08-06T10:00:15Z

Here's an update on the benchmarking for the Copernicus Marine (A-grid dataset). Turns out that the behaviour of v4 is quite different for particle sets larger than the 10k particles in the plot above. Note that this new plot has logarithmic axes, to better appreciate large differences.

Both v4-dev (yellow line above) and v4-timechunk (Parcels-code/Parcels#2098, orange line above) become excessively slow for larger particle sizes; even slower than v3-Scipy (grey line above).

The good news is that up to ~10k particles, the new v4 vectorized kernel (PR Parcels-code/Parcels#2122, green line above) is at ~40 seconds very fast; only slightly slower than v3-JIT (black line above) at ~15 seconds, and always much faster than v3-scipy!

But beyond approx 50k particles, this implementation also becomes very slow, with much poorer scaling than v3-JIT. To be further explored what's going on there, but it could be xarray/dask internals (this benchmark uses task for the fieldset)

erikvansebille · 2025-08-06T15:25:30Z

Below is an analysis of the peak memory use of the different PRs (as requested by @fluidnumerics-joe).

As expected, vectorized kernels (Parcels-code/Parcels#2122, green line) has a much smaller memory for low number of particles than v3; because both JIT (black line) and Scipy (grey line) need to load two time steps of the FieldSet into memory

For larger particle sets, vectorized kernels does grow faster because many temporary variables now are an arrays (instead of a float). But even for 1M particles, the vectorized kernel has a peak memory load of only 2GB; so that doesn't seem to be the bottleneck for the decreasing performance (see #2 (comment) above) for >10k particles

Adding benchmark test for CopernicusMarine with different Interpolators

30622a0

erikvansebille marked this pull request as draft July 22, 2025 07:38

erikvansebille added 2 commits July 22, 2025 16:20

Updating Copernicus benchmark to use shorter dt (since field time int…

897e643

…erval is hourly) and more particles

updating Bilinear interpolation function

9599090

erikvansebille added 2 commits July 23, 2025 14:48

Adding correctness test to CopernicusMarine benchmark

31be70c

Updating benchmark to use XLinear interpolation as default

05ebc3e

This was referenced Aug 6, 2025

Vectorized kernels in Parcels Parcels-code/Parcels#2122

Merged

Benchmark vectorized kernels #4

Open

Adding support for tracing memory load

75df125

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding benchmark test for CopernicusMarine with different Interpolators #2

Adding benchmark test for CopernicusMarine with different Interpolators #2

Uh oh!

erikvansebille commented Jul 21, 2025 •

edited

Loading

Uh oh!

erikvansebille commented Jul 22, 2025 •

edited

Loading

Uh oh!

erikvansebille commented Aug 6, 2025

Uh oh!

erikvansebille commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adding benchmark test for CopernicusMarine with different Interpolators #2

Are you sure you want to change the base?

Adding benchmark test for CopernicusMarine with different Interpolators #2

Uh oh!

Conversation

erikvansebille commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erikvansebille commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erikvansebille commented Aug 6, 2025

Uh oh!

erikvansebille commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erikvansebille commented Jul 21, 2025 •

edited

Loading

erikvansebille commented Jul 22, 2025 •

edited

Loading