Revise Regression Tests and have Relative Reporting in CI

We have the regression tests, which are amazing for finding if a change breaks SVGs. However, is not good at measuring if a change was actually _meaningful_.

For example:

* After a bug fix, do the metrics reflect what was expected?
* If we have a proposed or revised optimization, is it actually reducing bytes?
* How long did it take to run the regression test pipeline?
* Is a change doing anything else to the output of the SVGs? (Change in formatting, attribute order, etc.)

Before fixing more bugs/optimizations, let's finally address this so we have more insight.

## Scope

### Allow Test Failures

The regression tests should test all SVGs that it can, including ones that may fail. We should instead define three lists:

| List | Description |
|---|---|
| `expect-error` | SVGs that we know are broken by SVGO. The build will fail if during the regression pipeline, we determine that one of them are actually _not_ broken. This is for when we've intentionally or incidentally fixed an SVG. |
| `ignore` | Ignore the results of these SVGs as they are finicky. They sometimes pass, sometimes fail, we'll figure out why one day, perhaps. We'll report the status of them regardless, but the result has no effect on the pipeline status. |
| `skip` | SVGs that we shouldn't extract. We only have one SVG for this scenario right now, which takes too long to optimize to be practical for CI environments. |

Instead of being defined in code, let's make these separate config files that are read from in the regression tests directory. They'll be standalone text files, one file per line. 

### Metrics

The regression tests must output the following metrics:

* How many tests pass or failed, so we know the progress towards acing regression tests.
* The total number of bytes saved so we can confirm changes we are actually optimizing SVGs, and not undoing optimizations beyond necessary in fixes.
* The hash of every individual SVG, so we can keep track of which files are affected by changes in SVGO.
* The maximum memory footprint the process had during the run.

The results will be written to STDOUT and a file.

#### STDOUT

The resulting output should look something like this, but happy to consider alternative presentations. The data contained in the example is fictional.

Normal Run:

```
SVGO Test Suite Version: 53759b668cb444f579422e71936dd1f5

Test Results
------------
        Passed: 4,799 / 4,799
Expected Error: 159 / 159 
       Ignored: 1 / 1
       Skipped: 1

Metrics
-------
      Bytes Saved: 40.4 MiB
       Time Taken: 23m56s
Peek Memory Alloc: 2.305 GB

Relative to svg/svgo.git#main
-----------------------------
          Files Changed: 277 / 4,959
      Bytes Saved Delta: +0.4 MiB
       Time Taken Delta: +20s
Peek Memory Alloc Delta: +0.001 GB
```

If there is no previous report to compare results to:

```
…

Relative to svg/svgo.git#main
-----------------------------
Previous test report not available.
```

If the svgo-test-suite version is different from the version specified in the last test report:

```
…

Relative to svg/svgo.git#main
-----------------------------
Previous test report used a different version of svgo-test-suite.
Rerun regression tests on main to regenerate test report, then try again.
```

Implementation Details:

* The svgo-test-suite version will be included in the download from the svg/svgo-test-suite bundle. It will be the checksum after sorting all files in the dataset by name, then concatenating all file names and contents.
* In `Relative to …`, any `… Delta` line must use the same units as the field it's relative to.
* Numbers must be localized according to the system locale. (i.e., `1,000` not `1000` for en-GB)
* Files Changed can be determined by including a checksum of each optimized SVG in the test report, and comparing the checksums.
* It's expected to use a library to make text bold or colored, etc. The capabilities of the library may influence the design. For example, if we make headings bold, there's no need to underline them.

#### File

We will write a more verbose report to `.svgo/regression-results.json`, containing raw data for the run. The JSON file should not be concerned with presentation.

This will include everything, including the checksum of every file that was optimized. This file will not be committed to the Git repository.

The results will run on a merge to `main`, and will upload an artifact to GitHub (or cloud storage if GitHub isn't practical for whatever reason). Then, on CI and perhaps locally, we'll generate the same report. The regression tests can compare the results to the last report for `main`, and print the relative summary.

## Show Progress

While we're doing this revamp anyway, perhaps we should show a loading state of some kind until the results are ready. Better than looking frozen!

It should be a progress bar of some kind that shows how many of the total SVGs have been processed so far.

## Related

* johnkenny54 did something like this before in https://github.com/svg/svgo/pull/1932

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Revise Regression Tests and have Relative Reporting in CI #2160

Scope

Allow Test Failures

Metrics

STDOUT

File

Show Progress

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

List	Description
`expect-error`	SVGs that we know are broken by SVGO. The build will fail if during the regression pipeline, we determine that one of them are actually not broken. This is for when we've intentionally or incidentally fixed an SVG.
`ignore`	Ignore the results of these SVGs as they are finicky. They sometimes pass, sometimes fail, we'll figure out why one day, perhaps. We'll report the status of them regardless, but the result has no effect on the pipeline status.
`skip`	SVGs that we shouldn't extract. We only have one SVG for this scenario right now, which takes too long to optimize to be practical for CI environments.

Uh oh!

Revise Regression Tests and have Relative Reporting in CI #2160

Description

Scope

Allow Test Failures

Metrics

STDOUT

File

Show Progress

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions