Benchmark vectorized kernels #4

erikvansebille · 2025-07-29T10:16:41Z

One way to improve performance in Parcels is to 'vectorize' the kernels: i.e. to not make kernels look over particles, but to have them act on the entire particles. This PR explores the performance of that approach

erikvansebille · 2025-07-29T11:29:55Z

So a very quick first assessment of performance is below. JIT in v3 takes 8 seconds for 1 million particles in this simulation, and the implementation of vectorised kernels in Parcels-code/Parcels#2122 takes approximately 10 times as long. That's not bad, I'd say!

Note that the 'custom kernel' (red line, code below) is already much faster than the Parcels implementation, showing that there may be room for improvement!
https://github.com/OceanParcels/parcels-benchmarks/blob/44c28b1114368dce06586b0b5dcfad2a84573b37/benchmark_vectorized_kernels.py#L36-L73

erikvansebille · 2025-08-06T11:59:18Z

Here's an up[date of the scaling for the simple flow with vectorized kernels; to be seen in conjunction with https://github.com//pull/2#issuecomment-3158958884

The vectorized kernel in v4 (green, Parcels-code/Parcels#2122) line is quite a bit slower than v3-JIT (black line); but much faster than v3-Scipy (grey dashed line). We can get a bit of speedup by unit direct numpy indexing (instead of xarray.isel(), cyan line), but that doesn't work for dask (like in #2), so wouldn't be a general solution.

erikvansebille · 2025-08-06T15:31:06Z

As per #2 (comment), below also the peak memory use for the idealised flow field

Since this flow field is a simple stationary, 2D flow, the memory footprint for 1 particle is very small in all cases. In this case, the memory footprint for the vectorized kernel (green line) is almost five times as large as for v3-JIT (black line); but even for 2M particles, it's only ~1 GB

erikvansebille · 2025-08-07T12:05:39Z

I've looked a bit deeper into the memory use of the vectored kernels, and found something quite interesting (and good news!)

The diagram above shows, for a 100k particle run in v3-JIT and v4-vectored kernels (Parcels-code/Parcels#2122) the runtime in red and memory consumption in blue. As expected (and also shown in the posts above), vectored kernels are both a bit slower and have a larger memory footprint.

But, the memory footprint does not increase a lot when more complicated kernels are used(!). The difference between the built-in AdvectionEE and AdvectionRK4 kernels is only 10MB in this case (and, as expected, AdvectionRK4 is four times slower because it does four times more field evals).
But more surprisingly(!), a custom 'thin' AdvectionRK4 kernel where temporary variables are reused, does not have a huge impact on peak memory. I guess the python garbage collector is really smart?

Curious to hear what you think about this, @fluidnumerics-joe and @VeckoTheGecko!

fluidnumerics-joe · 2025-08-07T12:51:25Z

I would've thought the garbage collector cleaned up local variables when they went out of scope (e.g. outside the AdvectionRK4 kernel). This is indeed quite interesting. Is the runtime here the total simulation runtime or the accumulated runtime inside the advection kernel ?

erikvansebille added 4 commits July 29, 2025 09:44

First benchmark script for vectorized kernel

2b0fb1d

Updating vectorized_kernels_benchmark

c91fc40

Fixing typo in filename

9bd171b

Cleaning benchmark script to also work with Parcels kernel

44c28b1

erikvansebille mentioned this pull request Jul 29, 2025

Vectorized kernels in Parcels Parcels-code/Parcels#2122

Merged

1 task

Update benchmark_vectorized_kernels.py

8201ef6

erikvansebille added 2 commits August 7, 2025 13:40

Adding memory measurement support for benchmark

0f8398d

Small fixes to benchmark script

5f63a59

Making AdvectionRK4_thin work in v3 and v4

2723214

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark vectorized kernels #4

Benchmark vectorized kernels #4

Uh oh!

erikvansebille commented Jul 29, 2025

Uh oh!

erikvansebille commented Jul 29, 2025

Uh oh!

erikvansebille commented Aug 6, 2025 •

edited

Loading

Uh oh!

erikvansebille commented Aug 6, 2025

Uh oh!

erikvansebille commented Aug 7, 2025

Uh oh!

fluidnumerics-joe commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Benchmark vectorized kernels #4

Are you sure you want to change the base?

Benchmark vectorized kernels #4

Uh oh!

Conversation

erikvansebille commented Jul 29, 2025

Uh oh!

erikvansebille commented Jul 29, 2025

Uh oh!

erikvansebille commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erikvansebille commented Aug 6, 2025

Uh oh!

erikvansebille commented Aug 7, 2025

Uh oh!

fluidnumerics-joe commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erikvansebille commented Aug 6, 2025 •

edited

Loading