Metrics: Add a new e2e/metrics folder to start writing some test to collect performance metrics #97780

youknowriad · 2024-12-25T14:46:31Z

The ultimate goal is to have a wp.com or calypso dashboard in https://www.codevitals.run
This needs to be done in three steps (like we did before for woo, Gutenberg and core):

1- Add a script to be able to run and connect performance metrics on a given calypso instance.
2- Add a CI job to be able to run these metrics tests (and compare two different commits on that same run)
3- Log the results of that job result in codevitals.run

This PR attempts to address the first step here. It is based on this similar PR from Woo. woocommerce/woocommerce#42005 but it also uses the exact same setup as the calypso e2e tests (using these as a basis)

The PR is not ready yet because I want to add the "performance reporter" to the playwright config of the "metrics" test runner. The performance reporter class can be copy/pasted from the Woo PR but I'm having trouble finding the right way to add it to the playwright config. Help from someone familiar with calypso-e2e is needed.

For now I added a test that loads a large post on the post editor in Calypso and collects some basic loading metrics (the same one we use in woo and Gutenberg editor tests)

Testing instructions

cd test/e2e && yarn build && yarn metrics

…ollect performance metrics

matticbot · 2024-12-25T14:48:33Z

Calypso Live (direct link)

	https://calypso.live?image=registry.a8c.com/calypso/app:build-128742

Jetpack Cloud live (direct link)

	https://calypso.live?image=registry.a8c.com/calypso/app:build-128742&env=jetpack

Automattic for Agencies live (direct link)

	https://calypso.live?image=registry.a8c.com/calypso/app:build-128742&env=a8c-for-agencies

matticbot · 2024-12-25T14:48:59Z

Here is how your PR affects size of JS and CSS bundles shipped to the user's browser:

App Entrypoints (~9 bytes added 📈 [gzipped])

name                   parsed_size           gzip_size
entry-subscriptions          -32 B  (-0.0%)       +9 B  (+0.0%)
entry-stepper                -32 B  (-0.0%)       +9 B  (+0.0%)
entry-main                   -32 B  (-0.0%)       +9 B  (+0.0%)
entry-login                  -32 B  (-0.0%)       +9 B  (+0.0%)
entry-domains-landing        -32 B  (-0.0%)       +9 B  (+0.0%)

Common code that is always downloaded and parsed every time the app is loaded, no matter which route is used.

Legend

What is parsed and gzip size?

Parsed Size: Uncompressed size of the JS and CSS files. This much code needs to be parsed and stored in memory.
Gzip Size: Compressed size of the JS and CSS files. This much data needs to be downloaded over network.

Generated by performance advisor bot at iscalypsofastyet.com.

tyxla

Nice! Excited to see us gathering WP.com perf data in codevitals 👏

There are some pretty good opportunities to reuse code from Gutenberg, is there a good reason to avoid that?

tyxla · 2024-12-30T09:48:52Z

test/e2e/metrics/fixtures/large-post.html

I wish we could reuse the Gutenberg one instead of declaring one here.

tyxla · 2024-12-30T09:53:31Z

test/e2e/package.json

@@ -20,6 +20,7 @@
 		"clean:output": "rm -rf ./allure-results && rm -rf ./results",
 		"build": "yarn workspace @automattic/jest-circus-allure-reporter build && yarn workspace @automattic/calypso-e2e build",
 		"test": "yarn jest",
+		"metrics": "yarn jest --config jest.metrics.config.js",


Does it make sense to expose this to a script at the root level of the project? yarn run performance-metrics or something

I don't know, you know better than me, Personally I just tried to follow how e2e tests already work on calypso.

I think we're fine, we can always add it later if necessary 👍

tyxla · 2024-12-30T09:54:58Z

test/e2e/metrics/editor.spec.ts

+declare const browser: Browser;
+const results: Record< string, number[] > = {};
+
+describe( DataHelper.createSuiteTitle( 'Metrics: Editor' ), function () {


We keep reference those as just "metrics" but I think it might make sense to change them to "performance metrics" everywhere, since metrics is too generic.

tyxla · 2024-12-30T09:57:31Z

packages/calypso-e2e/src/lib/utils/metrics.ts

+/**
+ * Metrics class to collect performance metrics.
+ */
+export class Metrics {


Is there a good reason why we're not using @wordpress/e2e-test-utils-playwright here?

I tried quickly, it was not straightforward because of import related issues. I'm not very familiar with the e2e test setup of calypso to be able to solve this properly.

Will take a stab at it

It was pretty straightforward, but we also needed to add a peer dependency and shuffle some things around, see 47f9e94

I think there's a small risk that the class Metrics doesn't work as expected because of the iframe in some situations though.

I see. If it happens, it might make sense to extend it rather than redefine it though?

Happy to start that way and iterate.

tyxla · 2024-12-30T09:58:50Z

packages/calypso-e2e/src/lib/pages/editor-page.ts

+	 * Loads HTML into the editor.
+	 * @param content Content to load.
+	 */
+	async loadHtmlContent( content: string ) {


I wish we could reuse loadBlocksFromHtml() from Gutenberg. Perhaps we can move to the utils package and reuse from there?

We probably can't reuse the one from Gutenberg, Calypso's editor is a bit more complex (iframed within iframe sometimes) so we can't reuse Gutenberg's one until the Calypso's iframe is completely removed.

Let's add some TODOs here then - ideally we should be able to remove the iframe soon, and then it would be nice to reuse what we can.

matticbot · 2024-12-30T14:44:24Z

This PR modifies the release build for the following Calypso Apps:

For info about this notification, see here: PCYsg-OT6-p2

blaze-dashboard
command-palette-wp-admin
happy-blocks
help-center
notifications
odyssey-stats
whats-new
wpcom-block-editor

To test WordPress.com changes, run install-plugin.sh $pluginSlug add/collect-calypso-metrics on your sandbox.

tyxla

Working as expected 👍

I've pushed a small change to reuse some utils from @wordpress/e2e-test-utils-playwright, let me know if you have any concerns.

A question: have you given any thought about what kind of performance metrics we are aiming to track?

tyxla · 2024-12-30T14:38:40Z

test/e2e/metrics/editor.spec.ts

+			}
+		}
+
+		console.log( results );


What are we planning to do with the results? Is that good enough for the first pass / first step of what you're planning? Let me know if you need any help with the CI.

I have a question in the PR description about this. I would love to use the "performance-reporter" from either Gutenberg or Woo (check the Woo PR) instead of console logging. It's just a way to dump the results for now.

I've spent some time trying to wire the performance reporter but failed, It needs to be added to the playwright config file but in calypso e2e tests, there's not playwright config file really, it's kind of buried in a series of scripts and configs.

Yep, the entire setup is a bit rudimentary, and while it was migrated to Playwright, it still bears the mark of old technology used before it 😅

This is the config file: https://github.com/Automattic/wp-calypso/blob/2b4ed1bcc9a77c61f4456a70b051200957a84fd7/packages/calypso-e2e/src/jest-playwright-config/index.js - as you can see it uses Jest Playwright as a test runner and was never migrated to @playwright/test.

Now, I haven't tried using a custom reporter there, but you should be able to hook it the same way. You might need to work with a separate config to be able to inherit the original config and specify the custom reporter. Try it out and let me know how it goes.

Alternatively, migrating to @playwright/test could make sense, although I'm not sure what challenges you might bump into and how long it would take. I bet it's outside of your initial scope 😉

tyxla · 2024-12-30T15:02:45Z

packages/calypso-e2e/src/lib/utils/metrics.ts

+/**
+ * Metrics class to collect performance metrics.
+ */
+export class Metrics {


It was pretty straightforward, but we also needed to add a peer dependency and shuffle some things around, see 47f9e94

tyxla · 2024-12-30T15:12:07Z

Note that we already have some metrics for WP.com, so we might want to ensure we're not building something that exists.

For example, the Calypso RUM data provides some good insights about code vitals and is highly customizable in logstash / kibana:

f71e7d936028e25498cdbcf8d5428ebf-logstash

There are also some other Grafana and Kibana dashboards that might be useful.

cc @sgomes @josephscott @mreishus for any useful links to perf dashboards for WP.com / Calypso they can provide

youknowriad · 2024-12-30T15:35:56Z

A question: have you given any thought about what kind of performance metrics we are aiming to track?

Not much yet but "loading the editor" is a good one for me at least. I have found that the editor is a bit slow to load in .com so at least we can keep track of that over time and see how to improve it.

But overall, I think it really comes down to areas we're working on and slow things we notice.

tyxla · 2024-12-30T15:43:30Z

In addition to the above dashboard I sent, I just recalled we had setup a RUM data dashboard specifically for the iframed Gutenberg instance back in the day. Here it is:

Atomic sites: 09c736036ff08469a6fc009e389d889f-logstash
Simple sites: ea50f7e49fcf8234a3dc8a89cf826b50-logstash

youknowriad · 2024-12-30T16:03:11Z

Thanks for the links @tyxla It is possible that these dashboard are serving other needs, personally I'm having a very hard time understanding them properly. I'm not opinionated in terms of tooling though and would love to use whatever is available. That said what's important to me is:

1- A documented way for developers to add metrics (like the time it takes for the UI to react when opening the inserter, or switching a tab or something...)
2- A way to track that metric over time per commit/deploy to be able to easily find problematic changes
3- A way to compare that metic results of a PR in CI: Example here
4- A simple dashboard showing the important metrics that all developers should pay attention to when merging their PRs/deploying their commits. Something like this (same dashboard for all .com devs)

tyxla · 2024-12-30T16:13:31Z

Thanks for the context @youknowriad, I agree that these serve different purposes - the ones I sent are generic, while you're interested in tracking the performance of specific actions. With that in mind, having WP.com as an instance under codevitals.run makes sense.

The interesting, and potentially tricky part here, is that WP.com is a convoluted piece that depends on multiple repositories, namely these are a few that are worth highlighting:

Gutenberg is shipped on it, as well as a few other plugins
Jetpack runs on it
WP.com, the backend, runs on it
Calypso is part of its UI

I can imagine changing any of those might influence the metrics we're interested in. So we'll need either multiple sub-dashboards, or one dashboard that somehow works with multiple repositories. Has this come up as you've been considering introducing WP.com codevitals.run perf metrics?

youknowriad · 2024-12-30T16:18:15Z

I can imagine changing any of those might influence the metrics we're interested in. So we'll need either multiple sub-dashboards, or one dashboard that somehow works with multiple repositories. Has this come up as you've been considering introducing WP.com codevitals.run perf metrics?

Good remark and I agree it makes things trickier :). I think to do that properly we need two things:

1- Being able to run the same metrics tests in multiple repositories.
2- Being able to say which repository triggers the "log" call in codevitals (should be a small addition).

That said, I think for 1 (and unless I'm misunderstanding how .com deployments work), we only need to run these in two repositories: wpcom and wpcalypso. If I'm not wrong, for the two other repositories, the changes need to first be commited to either calypso or .com which would trigger the metrics logging.

tyxla · 2024-12-30T16:26:22Z

I can imagine changing any of those might influence the metrics we're interested in. So we'll need either multiple sub-dashboards, or one dashboard that somehow works with multiple repositories. Has this come up as you've been considering introducing WP.com codevitals.run perf metrics?

Good remark and I agree it makes things trickier :). I think to do that properly we need two things:

1- Being able to run the same metrics tests in multiple repositories. 2- Being able to say which repository triggers the "log" call in codevitals (should be a small addition).

That said, I think for 1 (and unless I'm misunderstanding how .com deployments work), we only need to run these in two repositories: wpcom and wpcalypso. If I'm not wrong, for the two other repositories, the changes need to first be commited to either calypso or .com which would trigger the metrics logging.

That makes sense. Deploying Jetpack, Gutenberg, and any other plugins is done in distinct commits to wpcom, so that will be easy enough to spot. What makes it tricky for the developer is to pinpoint the exact Jetpack or Gutenberg commit that caused it, since the commit will contain all the changes from the last deploy (for Gutenberg, that's an entire GB release for example). So we might need something on top of that to help us pinpoint the differences.

youknowriad · 2024-12-30T16:36:29Z

That makes sense. Deploying Jetpack, Gutenberg, and any other plugins is done in distinct commits to wpcom, so that will be easy enough to spot. What makes it tricky for the developer is to pinpoint the exact Jetpack or Gutenberg commit that caused it, since the commit will contain all the changes from the last deploy (for Gutenberg, that's an entire GB release for example). So we might need something on top of that to help us pinpoint the differences.

Yes, The ideal scenario is that each plugin/repository has a dashboard and run its metrics tests in isolation. In a sense we already have that in Gutenberg. We might want to add a dashboard for Jetpack just like we have for Woo.

That said, we'll still have some things that will be obscured (jetpack mu plugin for instance or things that Gutenberg perf tests might not have catched) but at least we'd reduce that number a lot.

In fact, even for Calypso if we had a way to run some kind of .com container separate from production we could do that but I know it's not the case right now. So what I suggest is to try to start in Calypso and try to see how we can reuse the test runner on .com.

ellatrix · 2025-01-10T10:13:56Z

packages/calypso-e2e/src/lib/pages/editor-page.ts

+			blocks.forEach( ( block: any ) => {
+				if ( block.name === 'core/image' ) {
+					delete block.attributes.id;
+					delete block.attributes.url;


Do we do this in GB?

This was originally copied from Gutenberg, so probably?

yes https://github.com/WordPress/gutenberg/blob/79234bd5b182a8be79129cb26db6b76a63daac0d/test/performance/fixtures/perf-utils.ts#L147

youknowriad requested review from jsnajdr, WunderBart and tyxla December 25, 2024 14:46

youknowriad self-assigned this Dec 25, 2024

matticbot added [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. labels Dec 25, 2024

Metrics: Add a new e2e/metrics folder to start writing some test to c…

ec89f65

…ollect performance metrics

youknowriad force-pushed the add/collect-calypso-metrics branch from 7480fad to ec89f65 Compare December 25, 2024 14:47

tyxla reviewed Dec 30, 2024

View reviewed changes

Reuse utils from @wordpress/e2e-test-utils-playwright

47f9e94

tyxla force-pushed the add/collect-calypso-metrics branch from d374a11 to 47f9e94 Compare December 30, 2024 15:01

tyxla reviewed Dec 30, 2024

View reviewed changes

ellatrix reviewed Jan 10, 2025

View reviewed changes

Metrics: Add a new e2e/metrics folder to start writing some test to collect performance metrics #97780

Are you sure you want to change the base?

Metrics: Add a new e2e/metrics folder to start writing some test to collect performance metrics #97780

Uh oh!

Conversation

youknowriad commented Dec 25, 2024

Uh oh!

matticbot commented Dec 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matticbot commented Dec 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tyxla left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matticbot commented Dec 30, 2024

Uh oh!

tyxla left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tyxla commented Dec 30, 2024

Uh oh!

youknowriad commented Dec 30, 2024

Uh oh!

tyxla commented Dec 30, 2024

Uh oh!

youknowriad commented Dec 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tyxla commented Dec 30, 2024

Uh oh!

youknowriad commented Dec 30, 2024

Uh oh!

tyxla commented Dec 30, 2024

Uh oh!

youknowriad commented Dec 30, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matticbot commented Dec 25, 2024 •

edited

Loading

matticbot commented Dec 25, 2024 •

edited

Loading

youknowriad commented Dec 30, 2024 •

edited

Loading