-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks with variance #9
base: main
Are you sure you want to change the base?
Conversation
Can I get 1 or 2 decimals of precision on the output? Cool idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions:
- Are these variance results something that we publish or that end users care about? Or is this more like a dev tool you'll be using most often.
- How often do you expect the variances' to change between runs?
- At what variance number do I take action? And if the variance is high, what do I do?
- Is this supposed to run on GitHub Actions Hosted Runners? Or is this destined for Kubernetes performance-benchmarking workers?
- When will this get run? Is this supposed to be triggered by Dolt release events? Will this be triggered by pull request comments?
I don't think this repository is the right spot for this script. This repository is just for sysbench lua scripts.
If this script is supposed to fit into our current performance-benchmarking pipeline... the best (easiest) way to do that is to enable our go benchmarking tool to run in a -variance
mode, that produces the same results of this script does. There will still be some infrastructure overhead with this approach too, though.
If the results of produced by this script need to be sent via email or posted to a pull request, CSV is not the desired final format. The results should also be available in html and markdown.
If this is just a dev tool you only need to run periodically, you can just check this script into the Dolt repo and run it when you need to on a dev desktop. It doesn't need to be integrated into the performance-benchmarking pipeline. And you can just email people who need the results after you get them.
Re variance questions: I think the question "are there other statistics that help us understand Dolt performance" is useful generally. Only practical feedback will say whether this, or quantiles, or confidence intervals, or anything else are actually useful. The variance provides two things: 1) a new piece of information, the event latency distribution, and 2) a way to short-circuit tests that run many events per second (i.e. don't need to run 2 minutes to give accurate means for many low variance tests). Re usage: I will start using this personally for deving perf changes, and only in that sandbox. The main motivation is that this will make me better at my job by removing the perf pipeline as a PR bottleneck. I'm not adding it to CI or releases/nightly. As long as only I am using this and I'm responsible for benchmarks I'd like to keep scripts, helpers, and docs next to each other. A related hypothesis is that this script as a GitHub action runner can do what the existing K8s sysbench runner does, in the same amount of time as the sysbench utility tester job (~15 minutes). The two factors I'd judge the validity of that question is: 1) Is this actually useful helping me ship changes faster? 2) Do the results diverge from the results of the existing benchmark runner? (In particular hardware/network issues.) I can answer (1) by doing my regular work, which should answer (2) after a few releases. I'd consider the refactor a low priority opportunistic change that everyone who looks at perf benchmarks would want to weigh in on. Re user-facing: We should only show additional stats to users if we think they are high-signal. |
Roger. Yea, just put this script in a different repo with your benchmarking tools, or check it into dolt. |
This is a short benchmark-runner that loops over 1) a list of dolt versions (commitish), 2) a list of scripts to run, and outputs test latency mean and variance. I think the variance metrics can be useful to give context on which tests have variability, and to give faster PR feedback. I intentionally undershot the event count to expose which tests have variance. Tests that quickly run enough events to get a realistic variance also terminate early. I also think a test runner not embedded in specific version of Dolt is valuable. Next steps would be plugging this into a CI trigger, maybe add MySQL after.
Sample read scripts output for
main
,0.40.21
, and0.40.20
, each take ~2 minutes to run: