CI(pytest): Upload test results to Codecov for test analytics #6126
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Codecov had this feature announced and available for a while now, but they're pushing it a little more now. I think it can be useful a bit.
Basically, you can upload a junit-style xml test report, which basically is an xml file with the test names, the results, and the time taken. Pytest works out of the box. The goal is to upload even on test failures, so that we can analyze which tests are flaky, how many are failing often etc.
For unittest, there's a package that we could use that replaces unittest' test runner class, but gunittest already does the same thing, so it's not readily available yet to implement (it would be more useful to get a value on the i.smap failure rate for example).
I was waiting before working on this and submitting a PR as it wasn't clear if the pull request comments would be posted on failures. There has been some feedback since, and I didn't see one yet, even if I did a PR with purposely failing tests after merging in my main branch (adding/removing "not" in asserts).
The interface, for the main branch (of my fork, play with it at https://app.codecov.io/gh/echoix/grass/tests/main), looks like:
I configured flags, so we can filter them:

It also analyzes what are the slow tests. For example, in our pytest tests, 28 tests take way more time than the others. (The time seems like for the last 30 days, but less since it's been on my main branch for only a couple commits, at the first commit it was like 4-5 minutes for a subset of 23 tests)
Here's an example for a branch, and also showing where there's a flaky test (a new v.class test from #6071, on sometimes fails on Windows, even in the main repo):

I didn't try on my second fork to see how it behaves without a codecov token. I strongly suppose it works the same as for code coverage upload, which still worked yesterday. Technically, they shouldn't have made a separate action, as it is pretty much the same as the code coverage upload action, it even uses the same cli binary uploader.
There was a beta feature in oct 2023 that looks still in beta that used a similar mechanism. You uploaded a list of tests collected before actually running them, and from the code coverage, the labels, the flags, and the changes, it would select the tests that must be ran, maybe a bit more, and you run these. I don't feel it was appropriate or ready to rely on, but it was an interesting idea. Having only the test results uploaded doesn't have any real impact if it stops working.
I also tried out using open id connect (oidc) with the
use_oidc: true
+ adding workflow permissions instead of having the CODECOV_TOKEN, for both the test result upload and code coverage upload, and both worked fine multiple times. I simply kept it out fromm here, as it was not needed to work correctly.