Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks with variance #9

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Benchmarks with variance #9

wants to merge 4 commits into from

Conversation

max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Aug 5, 2022

This is a short benchmark-runner that loops over 1) a list of dolt versions (commitish), 2) a list of scripts to run, and outputs test latency mean and variance. I think the variance metrics can be useful to give context on which tests have variability, and to give faster PR feedback. I intentionally undershot the event count to expose which tests have variance. Tests that quickly run enough events to get a realistic variance also terminate early. I also think a test runner not embedded in specific version of Dolt is valuable. Next steps would be plugging this into a CI trigger, maybe add MySQL after.

Sample read scripts output for main, 0.40.21, and 0.40.20, each take ~2 minutes to run:

summary for main at /var/folders/h_/n5qdj2tj3n741n128t7v2d7h0000gn/T/tmp.5USkhudY/summary/main.csv
| script               | mean          | var                 |
|----------------------|---------------|---------------------|
| covering_index_scan  | 15.10451      | 1.1138508014719468  |
| groupby_scan         | 16.24212      | 1.5890066286613307  |
| index_join_scan      | 17.61924      | 0.6317204651827049  |
| index_scan           | 156.601046875 | 7.391887008220157   |
| oltp_point_select    | 0.38645       | 0.09210223454422917 |
| oltp_read_only       | 8.27044       | 0.6200027446851226  |
| select_random_points | 1.048035      | 0.4325851631666992  |
| select_random_ranges | 1.402725      | 0.2969347412780954  |
| table_scan           | 150.4124      | 7.624369498604871   |


summary for v0.40.21 at /var/folders/h_/n5qdj2tj3n741n128t7v2d7h0000gn/T/tmp.O0fqhYcT/summary/v0.40.21.csv
| script               | mean                | var                 |
|----------------------|---------------------|---------------------|
| covering_index_scan  | 14.246475           | 0.5857827811642662  |
| groupby_scan         | 15.6504             | 0.9181732957372618  |
| index_join_scan      | 18.87815            | 2.4337875991732405  |
| index_scan           | 162.60002162162164  | 8.893199692533006   |
| oltp_point_select    | 0.41635500000000003 | 0.0904226201021108  |
| oltp_read_only       | 8.550575            | 0.9403220149683675  |
| select_random_points | 0.78283             | 0.19947786391215133 |
| select_random_ranges | 1.321045            | 0.26617484519781426 |
| table_scan           | 157.79160209424083  | 11.554515638977     |


summary for v0.40.20 at /var/folders/h_/n5qdj2tj3n741n128t7v2d7h0000gn/T/tmp.O0fqhYcT/summary/v0.40.20.csv
| script               | mean               | var                 |
|----------------------|--------------------|---------------------|
| covering_index_scan  | 15.91268           | 1.5481837527442621  |
| groupby_scan         | 18.21874           | 2.5152333183281708  |
| index_join_scan      | 19.023905          | 2.7848944885825904  |
| index_scan           | 164.4929617486339  | 38.78926194056976   |
| oltp_point_select    | 0.38974            | 0.10950419280194722 |
| oltp_read_only       | 8.752              | 1.3798065919380798  |
| select_random_points | 0.839795           | 0.23001831470668266 |
| select_random_ranges | 1.57079            | 0.4907979820618458  |
| table_scan           | 156.92905729166665 | 10.185672335364126  |

@max-hoffman max-hoffman changed the title Benchmark with variance Benchmarks with variance Aug 5, 2022
@timsehn
Copy link

timsehn commented Aug 5, 2022

Can I get 1 or 2 decimals of precision on the output? Cool idea.

@max-hoffman max-hoffman marked this pull request as ready for review August 8, 2022 16:38
Copy link
Contributor

@coffeegoddd coffeegoddd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions:

  1. Are these variance results something that we publish or that end users care about? Or is this more like a dev tool you'll be using most often.
  2. How often do you expect the variances' to change between runs?
  3. At what variance number do I take action? And if the variance is high, what do I do?
  4. Is this supposed to run on GitHub Actions Hosted Runners? Or is this destined for Kubernetes performance-benchmarking workers?
  5. When will this get run? Is this supposed to be triggered by Dolt release events? Will this be triggered by pull request comments?

I don't think this repository is the right spot for this script. This repository is just for sysbench lua scripts.

If this script is supposed to fit into our current performance-benchmarking pipeline... the best (easiest) way to do that is to enable our go benchmarking tool to run in a -variance mode, that produces the same results of this script does. There will still be some infrastructure overhead with this approach too, though.

If the results of produced by this script need to be sent via email or posted to a pull request, CSV is not the desired final format. The results should also be available in html and markdown.

If this is just a dev tool you only need to run periodically, you can just check this script into the Dolt repo and run it when you need to on a dev desktop. It doesn't need to be integrated into the performance-benchmarking pipeline. And you can just email people who need the results after you get them.

@max-hoffman
Copy link
Contributor Author

Re variance questions: I think the question "are there other statistics that help us understand Dolt performance" is useful generally. Only practical feedback will say whether this, or quantiles, or confidence intervals, or anything else are actually useful. The variance provides two things: 1) a new piece of information, the event latency distribution, and 2) a way to short-circuit tests that run many events per second (i.e. don't need to run 2 minutes to give accurate means for many low variance tests).

Re usage: I will start using this personally for deving perf changes, and only in that sandbox. The main motivation is that this will make me better at my job by removing the perf pipeline as a PR bottleneck. I'm not adding it to CI or releases/nightly. As long as only I am using this and I'm responsible for benchmarks I'd like to keep scripts, helpers, and docs next to each other.

A related hypothesis is that this script as a GitHub action runner can do what the existing K8s sysbench runner does, in the same amount of time as the sysbench utility tester job (~15 minutes). The two factors I'd judge the validity of that question is: 1) Is this actually useful helping me ship changes faster? 2) Do the results diverge from the results of the existing benchmark runner? (In particular hardware/network issues.) I can answer (1) by doing my regular work, which should answer (2) after a few releases. I'd consider the refactor a low priority opportunistic change that everyone who looks at perf benchmarks would want to weigh in on.

Re user-facing: We should only show additional stats to users if we think they are high-signal.

@coffeegoddd
Copy link
Contributor

Roger. Yea, just put this script in a different repo with your benchmarking tools, or check it into dolt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants