-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing results from Safari on Azure Pipelines #16229
Comments
@foolip I don't know, but there are plenty of errors/warnings that we don't treat as reasons to cause a non-zero exit code. Hard to tell what's going on without more verbose logs, though. |
I am not sure whether this is the same issue but I also spotted the same issue with XHR tests. |
Can we get more verbose logs? Can we look for Safari and WebContent crash logs? |
@smfr the logs were made as verbose as currently possible in #17379. Should we use the
So it sounds like the logs won't show up in stdout/stderr, but we could upload these logs as extra artifacts in Azure Pipelines. |
@smfr I've sent https://github.com/web-platform-tests/wpt/pulls but am not sure if it's adding useful logging or not. |
Even without that extra logging, in https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=27211 which I triggered manually, there's a fair bit of information about the Safari Technology Preview problem in the logs. Logs for 1 and 3 both show "Max restarts exceeded" at the end, and preceding that is "SessionNotCreatedException: session not created (500): Could not create a session: The Safari instance is already paired with another WebDriver session." a number of times. @smfr is that enough information to debug this, or would #18444 help here? I'll start a full run of Safari Technology Preview from that branch just to see what happens. |
In #18444 those extra logs were collected. I see "Could not create a session: The Safari instance is already paired with another WebDriver session." in some of the logs, but unfortunately that was the error for attempting to create a new session, which promptly failed:
But... it looks like the logs of the safaridriver process right before that might have a clue. This is the end of it:
It looks like this was while running /navigation-timing/nav2_test_document_replaced.html. The failure can be seen on wpt.fyi for both Safari configurations: Running Apple folks, can you provide any detail on when we'd get "Could not create a session: The Safari instance is already paired with another WebDriver session."? Is it because of failing to end a session previously, too many concurrent sessions, or...? |
@burg, any insight? |
Some additional clues about this can be found by looking at the 8 runs from the most recent daily run: Looking at these individually (example) one sees something like "Showing 34232 tests (1644318 subtests)". Number of subtests will vary somewhat, but for all 6 runs of Chrome, Edge and Firefox the number of tests in 34232. For both Safari runs, the number of tests is 30244. It's not always exactly the same, but looking back at the aligned runs for the previous day, it's off by 5 at most. This suggests the cause is the same for both Safari configurations and that it's not totally flaky. One can easily tell from the Azure Pipelines logs how long each job got before it stopped running, and with some effort figure out which This issue is now assigned to @gsnedders, but all and any help from WebKit folks would be appreciated. |
Mac Safari can only pair to one safaridriver session at a time. The previous session is still paired, so a new one can't be established. If Delete Session was not called during teardown from the HTTP 500, then that is probably the cause. The HTTP 500 error looks like a bug in safaridriver, it should be clearing out pending execution callbacks before navigating away the main frame. |
Some more clues for debugging this via #18519 where I increased the number of jobs/chunks. After downloading the logs for the "Run tests" step of each job from Azure Pipelines, search for 'Max restarts exceeded' to find the ones that failed early. With both 4 and 8 jobs, there are two jobs that fail like this, and the immediately preceding test is the same for both 4 and 8 jobs:
Both are timeouts and after that we get "Could not create a session: The Safari instance is already paired with another WebDriver session." until "Max restarts exceeded". Those tests use |
https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=27786 with the two test deleted seems to have worked: https://wpt.fyi/results/?run_id=288930005 There are 34274 tests, exactly two fewer than Firefox for the base commit. That pretty much confirms the problem is with these two tests. @gsnedders is there a quick way to just skip running these tests to have full results while this is investigated? If the problem is with safaridriver, we might need to skip for a long time until the next Safari stable. Being able to recover from the situation would be better of course. |
Those tests were introduced in #17223, on June 27. The next run after that was missing results compared to those before: @lukebjerring note that this is going to mess with any stats for Safari results over time. When this issue has been fixed, can we label the runs as incomplete somehow? |
#18585 did the trick as a workaround. Aligned runs with full Safari results (except the excluded tests) since it was merged: |
OK, with the debug info added in #18642, it's clearer what's happening. (Full logs here.) This is running, with STP release 90:
Notable things in the log:
Note the "Stopping WebDriver" (i.e., terminating the
This is where things really go wrong: We timeout in the middle of the call to "perform actions" (or rather, Now, because the WebDriver client always uses a single HTTP connection, we can't then send a Delete Session command:
We then kill |
Hello again. For safaridriver, Delete Session and New Session are known to race, because Delete Session doesn't wait for Safari to tear down its session. This is usually not an issue. If there's a bug in Perform Actions that causes the command to not return, then the recovery suggestion would be to kill Safari and safaridriver and start anew. I just merged a fix to one such hang that was reported by @foolip , not sure if this is related. |
Does killing safaridriver also kill any Safari processes started by it? If not, how should one kill those without potentially killing other Safari instances that aren't under automation control? |
@foolip No. safaridriver does not have a kill handler to do that. Nor does it work for Delete Session (though I plan to fix that). In general, it is not supported to run >1 instance of /Applications/Safari.app at a time, so killing more than one by mistake is not something I have run into personally. An alternative would be to map an older and newer Safari by pid and kill the newer one spawned by safaridriver. But these are all workarounds for bugs; I want to make sure the actual bugs are being reported so I can fix them. |
@burg certainly I want to see the underlying bug fixed as well, but I think having the restart logic be as robust as possible helps anyway. In this issue, we had lots of test results missing for almost two months which I hope we'll be able to avoid. Recovering from problems will of course also mask them to a high degree, but they will be in the logs for anyone who goes looking. If the wpt.fyi results for Safari are useful for WebKit engineers, maybe some sort of monitoring could be set up to notice this? @lukebjerring @Hexcles do you think it's possible to create such monitoring that isn't entirely drowned out by expected changes in test results? @gsnedders are you turning this into bugs with repro steps for @burg? |
WebKit engineers are going to be making more use of wpt.fyi results from now on, so some monitoring would be really useful! |
…-exclude, a=testonly Automatic update from web-platform-tests Skip problematic tests in Safari using --exclude (#18585) Workaround for web-platform-tests/wpt#16229. -- wpt-commits: 2c25b504c582b31efdd33ae429de78efe9558331 wpt-pr: 18585
@smfr that's great to hear! https://github.com/web-platform-tests/wpt/projects/4 has an overview of the necessary work to get Safari results into better shape. @burg beyond that, do you think that triaging Safari-specific failures, including tests results missing only in Safari, would be enough to notice when there are safaridriver or Safari bugs hit while running the tests? My main concern would be that most differences in test results aren't because of such bugs, and so it wouldn't be worthwhile looking deeply at each problem, so that problem still go unnoticed. If we do #18444, is there something in the logs that would always indicate a problem that needs to be fixed? Other ideas? @lukebjerring is it already possible to look for tests that are only missing in one browser? (Some missing subtests are expected, but at the test level it should not happen.) |
…-exclude, a=testonly Automatic update from web-platform-tests Skip problematic tests in Safari using --exclude (#18585) Workaround for web-platform-tests/wpt#16229. -- wpt-commits: 2c25b504c582b31efdd33ae429de78efe9558331 wpt-pr: 18585
@burg Can you check and see if I don't think the delete/new session race should be too much of a problem (we might try restarting multiple times, but that's hardly the end of the world 🤷♀️). I guess what we want to do is conditionally kill Safari in, uh, some case. Is there any better way to see if a given instance of Safari is already under automation than just trying to start |
sorry for the slow reply. The following query will give you results that are only missing (unknown) in Safari.
|
Thanks @lukebjerring. A bunch of those are going to be because of missing SharedWorker support in Safari and probably not all that interesting. Then there are cases like https://wpt.fyi/results/webauthn/createcredential-badargs-attestation.https.html?run_id=291470002&run_id=309980003&run_id=309980002&run_id=305010005 which also are because of something implemented everywhere except Safari, but where the failure mode isn't great. There, the helpers involved aren't wrapping test code correctly, and seemingly accidentally creating a single-page test in Safari. @zcorpan FYI. Anyway, I think that as long as the number of tests (not subtests) is correct, then comparing the results will mostly reveal issues with the tests as opposed to the infrastructure, so this issue should focus on complete runs at the test level. |
Hi @gsnedders, for the record, I have the same issue in #18595. After a strenuous debugging session I found the reason #18595 (comment) why the test suite became randomly stuck is because sometimes a SIGTERM signal is not enough to kill the WebDriver. My workaround is psaavedra@0ba9340 to force a SIGKILL instead of the SIGTERM. Which works but it is a bit rude. Do you have any other better alternative for this? |
@gsnedders is it fair to call this resolved now, or are there still cases where incomplete results might be produced instead of failing the build? |
@foolip If you want to consider the failing builds a new issue, I guess we can resolve this? Lots of relevant discussion in here though. |
There are still missing results, but there are issued file for all of the things skipped by Before we close this, can you check for an aligned run that the total number of tests is precisely the same for each browser, apart from the excluded tests? |
…-exclude, a=testonly Automatic update from web-platform-tests Skip problematic tests in Safari using --exclude (#18585) Workaround for web-platform-tests/wpt#16229. -- wpt-commits: 2c25b504c582b31efdd33ae429de78efe9558331 wpt-pr: 18585 UltraBlame original commit: 73225a9e209478b617cfc3024ed0523425e3efce
…-exclude, a=testonly Automatic update from web-platform-tests Skip problematic tests in Safari using --exclude (#18585) Workaround for web-platform-tests/wpt#16229. -- wpt-commits: 2c25b504c582b31efdd33ae429de78efe9558331 wpt-pr: 18585 UltraBlame original commit: 73225a9e209478b617cfc3024ed0523425e3efce
…-exclude, a=testonly Automatic update from web-platform-tests Skip problematic tests in Safari using --exclude (#18585) Workaround for web-platform-tests/wpt#16229. -- wpt-commits: 2c25b504c582b31efdd33ae429de78efe9558331 wpt-pr: 18585 UltraBlame original commit: 73225a9e209478b617cfc3024ed0523425e3efce
I've confirmed that #18634 is all that remains. These runs all have results from 36411 tests: https://wpt.fyi/results/?run_id=362500002 https://wpt.fyi/results/?run_id=364420003 https://wpt.fyi/results/?run_id=362520001 https://wpt.fyi/results/?run_id=352740005 And only the inert tests differ between Safari preview and failure: https://wpt.fyi/results/?run_id=350750015 https://wpt.fyi/results/?run_id=348890004 |
Not all tests are being run in Safari or Safari Technology Preview on Azure Pipelines, as some of the jobs are ending without running all of the tests. This is causing ~12% of all test results to be missing, with more towards the end of the A-Z listing, as tests are run in alphabetical order.
Original report:
I noticed via Intent to Ship: AnimationEffect and KeyframeEffect (except composite/iterationComposite) that results for some tests were missing entirely. The most recent 10 runs show them missing from just one:
https://wpt.fyi/results/web-animations/interfaces/KeyframeEffect?run_id=5904569678692352&run_id=4795949129924608&run_id=5197532699295744&run_id=5471855615934464&run_id=6039909097799680&run_id=6317856329302016&run_id=5111343174647808&run_id=6301552633446400&run_id=5094247594196992&run_id=5736531297828864
Those results came from this build:
https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=11873
The string "setKeyframes.html" doesn't appear in any of the files wpt_report_1.json through wpt_report_4.json files, and it also doesn't appear in any of the logs.
The build for it was https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=11829 and there it appears in the logs of the first job:
The first job of the bad build did take the usual amount of time, but it seems like it was incomplete, ending like this:
@gsnedders @jgraham do you know why it might happen that a run is incomplete like this without a non-zero exit code? This should be impossible?
@Hexcles this is a real-world problem that we could catch by vetting that results are complete in the monitoring we've discussed.
The text was updated successfully, but these errors were encountered: