Skip to content

Conversation

@VeckoTheGecko
Copy link
Contributor

@VeckoTheGecko VeckoTheGecko commented Nov 13, 2024

Todo

  • remove example benchmarks
  • Add integration test benchmarks (See asv_bench/benchmarks/benchmarks_integration.py for an example.)
    • advection 2d
    • ARGO float example
    • tutorial_nemo_curvilinear.ipynb (See asv_bench/benchmarks/benchmarks_particle_execution.py for an example.)
  • Add more detailed timing benchmarks
    • time the execution of 1000 particles for 1 time step
    • time the execution of 1000 particles for 100 time steps

When applicable, split into different phases. We could, e.g., go for a .setup() method for fieldset creating, and a .time_execute() method for the particle execution.


This PR introduces benchmarking infrastructure to the project via asv. Benchmarks can be run on a pull request by adding the run-benchmarks label to it. Two environments will be created in the CI runner with the prior and proposed changes, and both suites of benchmarks will be run and compared against each other.

Note that this PR only has example benchmarks for the timebeing until we can discuss benchmarks of interest.

The running of the benchmarks in CI is only one aspect of the benchmarking (ie, only for core parcels functionality). Using asv, we can create different suites of benchmarks (e.g., one for CI, and one for more heavy simulations). The benefit of using asv is everything else that comes out of the box with it, some being:

  • being able to run benchmarks easily across several commits, visualising them in a dashboard
  • easily create and manage these benchmarking environments
  • profiling support to dive into locations of slowdowns
  • community support

Changes:

  • asv configuration (conf file, benchmarks folder, and CI workflow)
  • asv documentation (available via the community page in the maintainer section)

I have done some testing of the PR label workflow in VeckoTheGecko#10 . We can only test this for PRs in OceanParcels/parcels when its in master


Related to #1712

@VeckoTheGecko
Copy link
Contributor Author

@erikvansebille On the topic of performance, are you also experiencing it taking something like 10s occasionally to run import parcels?

@erikvansebille
Copy link
Member

@erikvansebille On the topic of performance, are you also experiencing it taking something like 10s occasionally to run import parcels?

Yes, I also experience this slow import sometimes. Not sure why...

@VeckoTheGecko
Copy link
Contributor Author

Setting to draft until we have some actual benchmarks that we can include in this.

@VeckoTheGecko
Copy link
Contributor Author

From meeting:
We can use tutorial_nemo_curvilinear.ipynb as well

@erikvansebille
Copy link
Member

The Argo tutorial at https://docs.oceanparcels.org/en/latest/examples/tutorial_Argofloats.html is also a quite nice simulation for benchmarking, as it has a 'complex' kernel. It took approximately 20s to run on v3.1 in JIT mode, and no takes 50s on my local computer to run in Scipy-mode.

@danliba or @willirath, could you add the Argo tutorial to the benchmark stack?

@willirath willirath added the v3 label Feb 12, 2025
Copy link
Collaborator

@willirath willirath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danliba I've left a few comments.

@erikvansebille
Copy link
Member

Can we also use some of #1963?

@JamiePringle
Copy link
Collaborator

If you wish to use parts of #1963 and don't already have a Copernicus ocean account with the right access, email me and I can set up a quick way for you to download the circulation files

@willirath
Copy link
Collaborator

willirath commented Jul 6, 2025

So I've spent some time with this over the weekend. Here's a few insights:

  • ASV implicitly assumes that benchmarks don't change. This is most obvious from the fact that asv run will always discover benchmarks from $PWD/benchmarks/ with whatever is present in this directory at the time of invoking ASV. Hence adapting semantically identical benchmarks to changing API needs work in the .setup() method of benchmark suites.

  • ASV's more shiny features (accumulation and publication of benchmarks results over long times and across multiple versions) have never really been adopted widely. All the scipy and pydata projects I've checked (dask, numpy, asv-runner) have ASV benchmarks defined, but their auto generated and published reports are outdated for years. Also, all the example benchmark results given in the ASV docs didn't see any update for at least 4 years (see their astropy example, their numpy example, the other numpy example and their scipy example).

  • ASV has a way of compiling relative results of benchmarks for different commits taken in one run using asv continuous. This is what's done in the github workflow from an earlier commit and is used in xarray for detecting performance degradations introduced PRs. This workflow is, however, rarely used and was only requested in in 93 PRs out of more than 2000 PRs in the same time range.

  • Then env management in ASV appears to be in a phase of restructuring and especially the (faster) mamba based envs only work with enforcing pretty old versions of Mamba. It looks as if the ASV devs have given up on Mamba in favour of moving towards rattler support. This, however, also is nowhere near fully implemented. As a result, the only env management that worked smoothly with Parcels' compiler dependencies and did not result in thousands of lines of warnings and errors was "conda".

My recommendation is to forget about anything along the lines of "automation", "continuous", etc.

ASV provides a great way of defining benchmarks, running them against a bunch of similar well-behaved (i.e. no API changes or dependency changes) commits and then comparing the results with a rather narrow scope along the time / version axis. We should focus on this core functionality and leave the step of running and interpreting the benchmarks to the individual dev for now.

@VeckoTheGecko
Copy link
Contributor Author

My recommendation is to forget about anything along the lines of "automation", "continuous", etc.

ASV provides a great way of defining benchmarks, running them against a bunch of similar well-behaved (i.e. no API changes or dependency changes) commits and then comparing the results with a rather narrow scope along the time / version axis. We should focus on this core functionality and leave the step of running and interpreting the benchmarks to the individual dev for now.

Yes, I think this sounds good. I was talking with an xarray maintainer, and he mentioned that ASV isn't really used much in CI - the intermitency of github runners adds quite some noise which means that unless a regression degrades performance by a factor, it won't be obvious in output.

ASV provides a great way of defining benchmarks, running them against a bunch of similar well-behaved (i.e. no API changes or dependency changes) commits and then comparing the results with a rather narrow scope along the time / version axis.

I think so as well. Allowing us to define benchmarks for v3, port them to v4, then work in v4 with them will allow devs to have a targeted lens on performance (even if its just local - which is perfectly fine).

@VeckoTheGecko
Copy link
Contributor Author

Another note: As part of this PR can we remove dependence on the parcels/tools/timer.py::Timer class? (which is currently used in a couple examples as simple benchmarking)

Ideally once this is merged we can completely remove the Timer class from the codebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Ready
Status: Ready

Development

Successfully merging this pull request may close these issues.

6 participants