-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmark vectorized kernels #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
So a very quick first assessment of performance is below. JIT in v3 takes 8 seconds for 1 million particles in this simulation, and the implementation of vectorised kernels in Parcels-code/Parcels#2122 takes approximately 10 times as long. That's not bad, I'd say!
Note that the 'custom kernel' (red line, code below) is already much faster than the Parcels implementation, showing that there may be room for improvement! |
|
Here's an up[date of the scaling for the simple flow with vectorized kernels; to be seen in conjunction with https://github.com//pull/2#issuecomment-3158958884 The vectorized kernel in v4 (green, Parcels-code/Parcels#2122) line is quite a bit slower than v3-JIT (black line); but much faster than v3-Scipy (grey dashed line). We can get a bit of speedup by unit direct |
|
As per #2 (comment), below also the peak memory use for the idealised flow field
Since this flow field is a simple stationary, 2D flow, the memory footprint for 1 particle is very small in all cases. In this case, the memory footprint for the vectorized kernel (green line) is almost five times as large as for v3-JIT (black line); but even for 2M particles, it's only ~1 GB |
|
I've looked a bit deeper into the memory use of the vectored kernels, and found something quite interesting (and good news!) The diagram above shows, for a 100k particle run in v3-JIT and v4-vectored kernels (Parcels-code/Parcels#2122) the runtime in red and memory consumption in blue. As expected (and also shown in the posts above), vectored kernels are both a bit slower and have a larger memory footprint. But, the memory footprint does not increase a lot when more complicated kernels are used(!). The difference between the built-in AdvectionEE and AdvectionRK4 kernels is only 10MB in this case (and, as expected, AdvectionRK4 is four times slower because it does four times more field evals). Curious to hear what you think about this, @fluidnumerics-joe and @VeckoTheGecko! |
|
I would've thought the garbage collector cleaned up local variables when they went out of scope (e.g. outside the AdvectionRK4 kernel). This is indeed quite interesting. Is the runtime here the total simulation runtime or the accumulated runtime inside the advection kernel ? |




One way to improve performance in Parcels is to 'vectorize' the kernels: i.e. to not make kernels look over particles, but to have them act on the entire particles. This PR explores the performance of that approach