Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny benchmarking harness #2699

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

martinothamar
Copy link
Contributor

@martinothamar martinothamar commented Nov 7, 2024

Description

A benchmarking harness using tinybench as a suggestion to start tracking perf numbers. Using benchmarking provides us a paper trail - when we discover slow code that affect real usecases, we can provide benchmarks to show improvements in isolation. Upon future changes we can verify change in perf characteristics (for a variety of inputs) such as

  • Throughput
  • Latency
  • Std deviation (I'm not sure if it's possible to see the whole distribution, e.g. to rule out bimodal distributions)

Memory allocations is missing, I couldn't find a way to correctly correlate heap changes to benchmark runs, as there might be GCs occurring in the middle of runs.

The included benchmark is a microbenchmark comparing some implementations of splitDashedKey that did pop up in some profiling (although this is not an issue anymore, so it serves mostly as an example).

image

Related Issue(s)

  • N/A

Verification/QA

  • Manual functionality testing
    • I have tested these changes manually
    • Creator of the original issue (or service owner) has been contacted for manual testing (or will be contacted when released in alpha)
    • No testing done/necessary
  • Automated tests
    • Unit test(s) have been added/updated
    • Cypress E2E test(s) have been added/updated
    • No automatic tests are needed here (no functional changes/additions)
    • I want someone to help me make some tests
  • UU/WCAG (follow these guidelines until we have our own)
    • I have tested with a screen reader/keyboard navigation/automated wcag validator
    • No testing done/necessary (no DOM/visual changes)
    • I want someone to help me perform accessibility testing
  • User documentation @ altinn-studio-docs
    • Has been added/updated
    • No functionality has been changed/added, so no documentation is needed
    • I will do that later/have created an issue
  • Support in Altinn Studio
    • Issue(s) created for support in Studio
    • This change/feature does not require any changes to Altinn Studio
  • Sprint board
    • The original issue (or this PR itself) has been added to the Team Apps project and to the current sprint board
    • I don't have permissions to do that, please help me out
  • Labels
    • I have added a kind/* label to this PR for proper release notes grouping
    • I don't have permissions to add labels, please help me out

@martinothamar martinothamar self-assigned this Nov 7, 2024
@martinothamar martinothamar added kind/other Pull requests containing chores/repo structure/other changes ignore-for-release Pull requests to be ignored in release notes labels Nov 7, 2024
Copy link

sonarcloud bot commented Nov 7, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
7.9% Coverage on New Code (required ≥ 45%)
0.0% Condition Coverage on New Code (required ≥ 45%)

See analysis details on SonarCloud

Copy link
Contributor

@olemartinorg olemartinorg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 🙌 Have you perhaps found a way to run and present these benchmarks directly on github? I'd love to have an automatic runner that tells us immediately if the PR is detrimental to performance (compared to a recent baseline from main).

"ts-jest": "29.2.5",
"ts-loader": "9.5.1",
"ts-node": "^10.9.1",
"tsconfig-paths": "^4.2.0",
"tsx": "^4.19.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🤩 We already have ts-node as a typescript runner, but I looked into it and tsx looks a lot better/more promising. I'll look into replacing our few usages of ts-node with tsx after this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oo didn't even notice, used tsx because it was part of the example code of tinybench

@@ -112,10 +113,12 @@
"source-map-loader": "5.0.0",
"style-loader": "4.0.0",
"terser-webpack-plugin": "5.3.10",
"tinybench": "^3.0.3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into it, and while tinybench seems to work well enough for the example use-case here, it doesn't seem there's any straight-forward ways to benchmark react components (including re-rendering) with it. There are libraries to do that, some of which seems to spin up headless chrome to run their benchmark in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle I can't think of any reason why a React test renderer wouldn't run under NodeJS in this context when it is no problem in a jest testrunner? If it's practically much harder for some reason I'm not opposed to other tools at all, and I think we can even have multiple tools for different use-cases (though I have previously often just stuck to the same benchmarking harness for all level of benchmarks previously). But I think we should start to define a process around this to figure out

  • What should we require benchmarks for
  • How should we track the numbers over time

@olemartinorg olemartinorg mentioned this pull request Nov 7, 2024
19 tasks
@adamhaeger
Copy link
Contributor

adamhaeger commented Nov 11, 2024

Hmm I dont really see how tinybench is useful for real world performance testing of a webapp. Seems more focused on low level testing of functions.

Id say some of the metrics we should focus on are:

  1. Component Render Times
    • Initial Render Time
    • Re-render Times
    • Wasted Renders
  2. React Reconciliation (Diffing Algorithm) Efficiency
  3. State and Prop Update Times
  4. Time to First Paint (TTFP) and First Contentful Paint (FCP)
  5. Memory Usage
  6. Time to Interactive (TTI)
  7. Largest Contentful Paint (LCP)
  8. Layout Shift (Cumulative Layout Shift - CLS)
  9. Network Request Time for Data Fetching
  10. JavaScript Bundle Size and Code Splitting

Tools to Measure These Metrics

  • React DevTools Profiler
  • Chrome DevTools Performance Tab
  • Web Vitals
  • Lighthouse

We could for example set up lighthouse CI which provides many of these metrics: https://github.com/GoogleChrome/lighthouse-ci

Could fail the build if its too slow.

@martinothamar
Copy link
Contributor Author

Seems more focused on low level testing of functions.

Agreed, and I think we need both. We need high level testing that uses metrics focused on UX, but during the optimization process of our code we also need benchmarking that verifies that we actually optimize.

The tools you mention are good for the explorative/analysis phase where you are running one-offs and high-level before/after. Benchmarks are good for iterating on optimizations when you've chosed your place of optimization. Proper measurement of optimization requires a lot of runs/executions so that the distribution of latency can be inspected. Higher level tests where you measure metrics to capture UX quality like the ones you mention can't be run a ton of times while also having a fast feedback loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ignore-for-release Pull requests to be ignored in release notes kind/other Pull requests containing chores/repo structure/other changes
Projects
Status: 🔎 Review
Development

Successfully merging this pull request may close these issues.

3 participants