Choosing a performance profiling framework #769

berombau · 2024-11-07T10:35:32Z

Is your feature request related to a problem? Please describe.
Currently, performance benchmarks are run locally in an IDE or with a script. It would be nice to have a certain documented way to do this (#763) to make sharing and reproducing these benchmarks easier. This will also help towards more advanced things like regressing testing.

Describe the solution you'd like
airspeed velocity (asv) seems to be the best pick and has documentation and existing examples. Caching and loading small and large datasets to make a meaningful analysis is probably the next step.

choosing a framework
caching synthetic blobs datasets of various sizes for quick testing
adding some small quick tests like loading the SpatialData library and reading and writing a dataset

Describe alternatives you've considered
This discussion have been done for many other projects. Here are the most popular frameworks:

asv
- similar to the scikit-image, dask and napari-spatialdata setup
- mature and supports time, memory and other metrics
- report is static website
- seems to have active maintenance
pytest-benchmark
- easy and nice step up from pytest
- support for codspeed
- does not support memory or IO
pytest-monitor
conbench
- advanced and multi-language support (including R)
- report is not static website
coild/benchmarks
- nice discussion on benchmark frameworks (Choose benchmarking tool coiled/benchmarks#186)
- advanced and works with Dask and HPC
- uses https://github.com/gjoseph92/dask-pyspy
- custom fixture approach seems like a learning curve
- meant for internal coiled use

Not immediately a priority, but it would be nice to know the direction to work toward. Thoughts on asv, @LucaMarconato @Czaki? I tried it out and it seems suitable.

Czaki · 2024-11-07T10:51:31Z

The main problem of benchmarks is reproducibility.
The benchmark output is really fragile to the machine that is used to run it.
This is why in napari we use asv, and based on it, I make a PR to napari-spatialdata. Then asv runs on two commits and compares it, making it independent of changes of machine.

So if benchmarks are traced using external service (like pytest-benchmark + codespeed), there may be multiple false positive/negative reports because of changes in machine used for tests.

The obvious workaround for this is to have the own separate machine only for benchmark running that will have disabled automated updates. And each time there is an update of something on the machine (ex. NumPy version, Python version) there is also recalculation of performance results for historical measure points.

berombau added this to Basel Hackathon Nov 2024 Nov 7, 2024

berombau linked a pull request Nov 13, 2024 that will close this issue

[WIP] add asv benchmark code #784

Open

berombau moved this to 👀 In review in Basel Hackathon Nov 2024 Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choosing a performance profiling framework #769

Choosing a performance profiling framework #769

berombau commented Nov 7, 2024

Czaki commented Nov 7, 2024

Choosing a performance profiling framework #769

Choosing a performance profiling framework #769

Comments

berombau commented Nov 7, 2024

Czaki commented Nov 7, 2024