Benchmarking with asv #1761

VeckoTheGecko · 2024-11-13T05:18:36Z

Todo

remove example benchmarks
Add integration test benchmarks (See asv_bench/benchmarks/benchmarks_integration.py for an example.)
- advection 2d
- ARGO float example
- tutorial_nemo_curvilinear.ipynb (See asv_bench/benchmarks/benchmarks_particle_execution.py for an example.)
Add more detailed timing benchmarks
- time the execution of 1000 particles for 1 time step
- time the execution of 1000 particles for 100 time steps

When applicable, split into different phases. We could, e.g., go for a .setup() method for fieldset creating, and a .time_execute() method for the particle execution.

This PR introduces benchmarking infrastructure to the project via asv. Benchmarks can be run on a pull request by adding the run-benchmarks label to it. Two environments will be created in the CI runner with the prior and proposed changes, and both suites of benchmarks will be run and compared against each other.

Note that this PR only has example benchmarks for the timebeing until we can discuss benchmarks of interest.

The running of the benchmarks in CI is only one aspect of the benchmarking (ie, only for core parcels functionality). Using asv, we can create different suites of benchmarks (e.g., one for CI, and one for more heavy simulations). The benefit of using asv is everything else that comes out of the box with it, some being:

being able to run benchmarks easily across several commits, visualising them in a dashboard
easily create and manage these benchmarking environments
profiling support to dive into locations of slowdowns
community support

Changes:

asv configuration (conf file, benchmarks folder, and CI workflow)
asv documentation (available via the community page in the maintainer section)

I have done some testing of the PR label workflow in VeckoTheGecko#10 . We can only test this for PRs in OceanParcels/parcels when its in master

Related to #1712

This reverts commit 0f5fa2a.

VeckoTheGecko · 2024-11-13T05:31:35Z

@erikvansebille On the topic of performance, are you also experiencing it taking something like 10s occasionally to run import parcels?

erikvansebille · 2024-11-13T07:04:14Z

@erikvansebille On the topic of performance, are you also experiencing it taking something like 10s occasionally to run import parcels?

Yes, I also experience this slow import sometimes. Not sure why...

VeckoTheGecko · 2024-12-03T08:48:46Z

Setting to draft until we have some actual benchmarks that we can include in this.

VeckoTheGecko · 2025-02-11T14:51:40Z

From meeting:
We can use tutorial_nemo_curvilinear.ipynb as well

for more information, see https://pre-commit.ci

erikvansebille · 2025-02-12T08:03:39Z

The Argo tutorial at https://docs.oceanparcels.org/en/latest/examples/tutorial_Argofloats.html is also a quite nice simulation for benchmarking, as it has a 'complex' kernel. It took approximately 20s to run on v3.1 in JIT mode, and no takes 50s on my local computer to run in Scipy-mode.

@danliba or @willirath, could you add the Argo tutorial to the benchmark stack?

for more information, see https://pre-commit.ci

willirath

@danliba I've left a few comments.

asv_bench/benchmarks/Argofloat_benchmark.py

for more information, see https://pre-commit.ci

erikvansebille · 2025-04-07T15:23:27Z

Can we also use some of #1963?

JamiePringle · 2025-04-07T15:25:40Z

If you wish to use parts of #1963 and don't already have a Copernicus ocean account with the right access, email me and I can set up a quick way for you to download the circulation files

for more information, see https://pre-commit.ci

willirath · 2025-07-06T15:59:07Z

So I've spent some time with this over the weekend. Here's a few insights:

ASV implicitly assumes that benchmarks don't change. This is most obvious from the fact that asv run will always discover benchmarks from $PWD/benchmarks/ with whatever is present in this directory at the time of invoking ASV. Hence adapting semantically identical benchmarks to changing API needs work in the .setup() method of benchmark suites.
ASV's more shiny features (accumulation and publication of benchmarks results over long times and across multiple versions) have never really been adopted widely. All the scipy and pydata projects I've checked (dask, numpy, asv-runner) have ASV benchmarks defined, but their auto generated and published reports are outdated for years. Also, all the example benchmark results given in the ASV docs didn't see any update for at least 4 years (see their astropy example, their numpy example, the other numpy example and their scipy example).
ASV has a way of compiling relative results of benchmarks for different commits taken in one run using asv continuous. This is what's done in the github workflow from an earlier commit and is used in xarray for detecting performance degradations introduced PRs. This workflow is, however, rarely used and was only requested in in 93 PRs out of more than 2000 PRs in the same time range.
Then env management in ASV appears to be in a phase of restructuring and especially the (faster) mamba based envs only work with enforcing pretty old versions of Mamba. It looks as if the ASV devs have given up on Mamba in favour of moving towards rattler support. This, however, also is nowhere near fully implemented. As a result, the only env management that worked smoothly with Parcels' compiler dependencies and did not result in thousands of lines of warnings and errors was "conda".

My recommendation is to forget about anything along the lines of "automation", "continuous", etc.

ASV provides a great way of defining benchmarks, running them against a bunch of similar well-behaved (i.e. no API changes or dependency changes) commits and then comparing the results with a rather narrow scope along the time / version axis. We should focus on this core functionality and leave the step of running and interpreting the benchmarks to the individual dev for now.

VeckoTheGecko · 2025-07-07T13:16:54Z

My recommendation is to forget about anything along the lines of "automation", "continuous", etc.

ASV provides a great way of defining benchmarks, running them against a bunch of similar well-behaved (i.e. no API changes or dependency changes) commits and then comparing the results with a rather narrow scope along the time / version axis. We should focus on this core functionality and leave the step of running and interpreting the benchmarks to the individual dev for now.

Yes, I think this sounds good. I was talking with an xarray maintainer, and he mentioned that ASV isn't really used much in CI - the intermitency of github runners adds quite some noise which means that unless a regression degrades performance by a factor, it won't be obvious in output.

ASV provides a great way of defining benchmarks, running them against a bunch of similar well-behaved (i.e. no API changes or dependency changes) commits and then comparing the results with a rather narrow scope along the time / version axis.

I think so as well. Allowing us to define benchmarks for v3, port them to v4, then work in v4 with them will allow devs to have a targeted lens on performance (even if its just local - which is perfectly fine).

VeckoTheGecko · 2025-07-08T11:32:53Z

Another note: As part of this PR can we remove dependence on the parcels/tools/timer.py::Timer class? (which is currently used in a couple examples as simple benchmarking)

Ideally once this is merged we can completely remove the Timer class from the codebase

VeckoTheGecko added 5 commits November 13, 2024 12:29

update contributing page

9f50507

Update dev environment to use py3.10

0f5fa2a

Add initial asv config

48ef818

patch patch asv config and add docs

fcf2ff3

patch benchmarks ci

f8a37ef

VeckoTheGecko requested a review from erikvansebille November 13, 2024 05:18

VeckoTheGecko added 2 commits November 13, 2024 13:22

Revert "Update dev environment to use py3.10"

6d25f40

This reverts commit 0f5fa2a.

Update docs

6a6b8bf

VeckoTheGecko marked this pull request as draft December 3, 2024 08:48

VeckoTheGecko mentioned this pull request Dec 13, 2024

xr.Dataset as core data structure for FieldSet #1796

Open

erikvansebille added the v4 label Feb 11, 2025

erikvansebille assigned willirath and danliba Feb 11, 2025

willirath and others added 2 commits February 11, 2025 16:31

Add 3d advection benchmark

e659f40

[pre-commit.ci] auto fixes from pre-commit.com hooks

60c7d55

for more information, see https://pre-commit.ci

willirath added the run-benchmark label Feb 12, 2025

willirath and others added 4 commits February 12, 2025 10:58

Add integration test benchmarks

84b8c26

[pre-commit.ci] auto fixes from pre-commit.com hooks

5c41d41

for more information, see https://pre-commit.ci

Add particle execution timings

6541d28

[pre-commit.ci] auto fixes from pre-commit.com hooks

ce2a540

for more information, see https://pre-commit.ci

willirath added the v3 label Feb 12, 2025

Lizarbe and others added 2 commits February 18, 2025 12:33

ArgoFloat benchmark

c5d0f2f

[pre-commit.ci] auto fixes from pre-commit.com hooks

51bf80a

for more information, see https://pre-commit.ci

willirath requested changes Feb 19, 2025

View reviewed changes

asv_bench/benchmarks/Argofloat_benchmark.py Outdated Show resolved Hide resolved

asv_bench/benchmarks/Argofloat_benchmark.py Outdated Show resolved Hide resolved

asv_bench/benchmarks/Argofloat_benchmark.py Outdated Show resolved Hide resolved

Argo fixed

7bbdcd6

pre-commit-ci bot and others added 9 commits February 26, 2025 10:37

[pre-commit.ci] auto fixes from pre-commit.com hooks

62cb3be

for more information, see https://pre-commit.ci

adapt runtime

1e08f59

[pre-commit.ci] auto fixes from pre-commit.com hooks

7d3348f

for more information, see https://pre-commit.ci

nemo start

579a105

[pre-commit.ci] auto fixes from pre-commit.com hooks

4559980

for more information, see https://pre-commit.ci

nemo curvilinear

1f9c6c9

nemo curvilinear

28782df

nemo curvilinear

50fee82

[pre-commit.ci] auto fixes from pre-commit.com hooks

0177534

for more information, see https://pre-commit.ci

willirath and others added 11 commits July 2, 2025 10:13

Fix benchmark workflow

61f93a5

Bump runner image

e9b34e3

Fix multistep execution

886af3d

Also fix Scipy multi step execution

bf33c82

Make asv verbose and skip bottleneck

0ad59a0

Set large timeout

bda0028

[pre-commit.ci] auto fixes from pre-commit.com hooks

5bd49d4

for more information, see https://pre-commit.ci

Try avoiding git describe issue

0a271e6

Don't fail by grepping

0e9a85b

Be less verbose in benchmark

557ea92

Make repo url explicit

68d8d96

Benchmarking with asv #1761

Are you sure you want to change the base?

Benchmarking with asv #1761

Uh oh!

Conversation

VeckoTheGecko commented Nov 13, 2024 • edited by danliba Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Todo

Uh oh!

VeckoTheGecko commented Nov 13, 2024

Uh oh!

erikvansebille commented Nov 13, 2024

Uh oh!

VeckoTheGecko commented Dec 3, 2024

Uh oh!

VeckoTheGecko commented Feb 11, 2025

Uh oh!

erikvansebille commented Feb 12, 2025

Uh oh!

willirath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erikvansebille commented Apr 7, 2025

Uh oh!

JamiePringle commented Apr 7, 2025

Uh oh!

willirath commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VeckoTheGecko commented Jul 7, 2025

Uh oh!

VeckoTheGecko commented Jul 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

VeckoTheGecko commented Nov 13, 2024 •

edited by danliba

Loading

willirath commented Jul 6, 2025 •

edited

Loading