Tiny benchmarking harness #2699

martinothamar · 2024-11-07T07:49:00Z

Description

A benchmarking harness using tinybench as a suggestion to start tracking perf numbers. Using benchmarking provides us a paper trail - when we discover slow code that affect real usecases, we can provide benchmarks to show improvements in isolation. Upon future changes we can verify change in perf characteristics (for a variety of inputs) such as

Throughput
Latency
Std deviation (I'm not sure if it's possible to see the whole distribution, e.g. to rule out bimodal distributions)

Memory allocations is missing, I couldn't find a way to correctly correlate heap changes to benchmark runs, as there might be GCs occurring in the middle of runs.

The included benchmark is a microbenchmark comparing some implementations of splitDashedKey that did pop up in some profiling (although this is not an issue anymore, so it serves mostly as an example).

Related Issue(s)

N/A

Verification/QA

Manual functionality testing
- I have tested these changes manually
- Creator of the original issue (or service owner) has been contacted for manual testing (or will be contacted when released in alpha)
- No testing done/necessary
Automated tests
- Unit test(s) have been added/updated
- Cypress E2E test(s) have been added/updated
- No automatic tests are needed here (no functional changes/additions)
- I want someone to help me make some tests
UU/WCAG (follow these guidelines until we have our own)
- I have tested with a screen reader/keyboard navigation/automated wcag validator
- No testing done/necessary (no DOM/visual changes)
- I want someone to help me perform accessibility testing
User documentation @ altinn-studio-docs
- Has been added/updated
- No functionality has been changed/added, so no documentation is needed
- I will do that later/have created an issue
Support in Altinn Studio
- Issue(s) created for support in Studio
- This change/feature does not require any changes to Altinn Studio
Sprint board
- The original issue (or this PR itself) has been added to the Team Apps project and to the current sprint board
- I don't have permissions to do that, please help me out
Labels
- I have added a kind/* label to this PR for proper release notes grouping
- I don't have permissions to add labels, please help me out

sonarcloud · 2024-11-07T08:06:38Z

Quality Gate failed

Failed conditions
1 Security Hotspot
7.9% Coverage on New Code (required ≥ 45%)
0.0% Condition Coverage on New Code (required ≥ 45%)

See analysis details on SonarCloud

olemartinorg

Looks good! 🙌 Have you perhaps found a way to run and present these benchmarks directly on github? I'd love to have an automatic runner that tells us immediately if the PR is detrimental to performance (compared to a recent baseline from main).

olemartinorg · 2024-11-07T08:41:15Z

package.json

    "ts-jest": "29.2.5",
    "ts-loader": "9.5.1",
    "ts-node": "^10.9.1",
    "tsconfig-paths": "^4.2.0",
+    "tsx": "^4.19.2",


Nice! 🤩 We already have ts-node as a typescript runner, but I looked into it and tsx looks a lot better/more promising. I'll look into replacing our few usages of ts-node with tsx after this.

oo didn't even notice, used tsx because it was part of the example code of tinybench

olemartinorg · 2024-11-07T08:46:12Z

package.json

@@ -112,10 +113,12 @@
    "source-map-loader": "5.0.0",
    "style-loader": "4.0.0",
    "terser-webpack-plugin": "5.3.10",
+    "tinybench": "^3.0.3",


I looked into it, and while tinybench seems to work well enough for the example use-case here, it doesn't seem there's any straight-forward ways to benchmark react components (including re-rendering) with it. There are libraries to do that, some of which seems to spin up headless chrome to run their benchmark in.

In principle I can't think of any reason why a React test renderer wouldn't run under NodeJS in this context when it is no problem in a jest testrunner? If it's practically much harder for some reason I'm not opposed to other tools at all, and I think we can even have multiple tools for different use-cases (though I have previously often just stuck to the same benchmarking harness for all level of benchmarks previously). But I think we should start to define a process around this to figure out

What should we require benchmarks for

How should we track the numbers over time

adamhaeger · 2024-11-11T08:11:02Z

Hmm I dont really see how tinybench is useful for real world performance testing of a webapp. Seems more focused on low level testing of functions.

Id say some of the metrics we should focus on are:

Component Render Times
- Initial Render Time
- Re-render Times
- Wasted Renders
React Reconciliation (Diffing Algorithm) Efficiency
State and Prop Update Times
Time to First Paint (TTFP) and First Contentful Paint (FCP)
Memory Usage
Time to Interactive (TTI)
Largest Contentful Paint (LCP)
Layout Shift (Cumulative Layout Shift - CLS)
Network Request Time for Data Fetching
JavaScript Bundle Size and Code Splitting

Tools to Measure These Metrics

React DevTools Profiler
Chrome DevTools Performance Tab
Web Vitals
Lighthouse

We could for example set up lighthouse CI which provides many of these metrics: https://github.com/GoogleChrome/lighthouse-ci

Could fail the build if its too slow.

martinothamar · 2024-11-11T08:24:29Z

Seems more focused on low level testing of functions.

Agreed, and I think we need both. We need high level testing that uses metrics focused on UX, but during the optimization process of our code we also need benchmarking that verifies that we actually optimize.

The tools you mention are good for the explorative/analysis phase where you are running one-offs and high-level before/after. Benchmarks are good for iterating on optimizations when you've chosed your place of optimization. Proper measurement of optimization requires a lot of runs/executions so that the distribution of latency can be inspected. Higher level tests where you measure metrics to capture UX quality like the ones you mention can't be run a ton of times while also having a fast feedback loop

martinothamar self-assigned this Nov 7, 2024

martinothamar added kind/other Pull requests containing chores/repo structure/other changes ignore-for-release Pull requests to be ignored in release notes labels Nov 7, 2024

Tiny benchmarking harness

4e1b1c9

martinothamar force-pushed the chore/tiny-benchmarking-harness branch from b6a7486 to 4e1b1c9 Compare November 7, 2024 07:52

olemartinorg approved these changes Nov 7, 2024

View reviewed changes

olemartinorg mentioned this pull request Nov 7, 2024

Making node-lookups faster #2701

Merged

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tiny benchmarking harness #2699

Tiny benchmarking harness #2699

martinothamar commented Nov 7, 2024 •

edited

Loading

sonarcloud bot commented Nov 7, 2024

olemartinorg left a comment

olemartinorg Nov 7, 2024

martinothamar Nov 7, 2024

olemartinorg Nov 7, 2024

martinothamar Nov 8, 2024

adamhaeger commented Nov 11, 2024 •

edited

Loading

martinothamar commented Nov 11, 2024

Tiny benchmarking harness #2699

Are you sure you want to change the base?

Tiny benchmarking harness #2699

Conversation

martinothamar commented Nov 7, 2024 • edited Loading

Description

Related Issue(s)

Verification/QA

sonarcloud bot commented Nov 7, 2024

Quality Gate failed

olemartinorg left a comment

Choose a reason for hiding this comment

olemartinorg Nov 7, 2024

Choose a reason for hiding this comment

martinothamar Nov 7, 2024

Choose a reason for hiding this comment

olemartinorg Nov 7, 2024

Choose a reason for hiding this comment

martinothamar Nov 8, 2024

Choose a reason for hiding this comment

adamhaeger commented Nov 11, 2024 • edited Loading

Tools to Measure These Metrics

martinothamar commented Nov 11, 2024

martinothamar commented Nov 7, 2024 •

edited

Loading

adamhaeger commented Nov 11, 2024 •

edited

Loading