Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing results from Safari on Azure Pipelines #16229

Closed
foolip opened this issue Apr 2, 2019 · 32 comments
Closed

Missing results from Safari on Azure Pipelines #16229

foolip opened this issue Apr 2, 2019 · 32 comments

Comments

@foolip
Copy link
Member

foolip commented Apr 2, 2019

Not all tests are being run in Safari or Safari Technology Preview on Azure Pipelines, as some of the jobs are ending without running all of the tests. This is causing ~12% of all test results to be missing, with more towards the end of the A-Z listing, as tests are run in alphabetical order.


Original report:

I noticed via Intent to Ship: AnimationEffect and KeyframeEffect (except composite/iterationComposite) that results for some tests were missing entirely. The most recent 10 runs show them missing from just one:
https://wpt.fyi/results/web-animations/interfaces/KeyframeEffect?run_id=5904569678692352&run_id=4795949129924608&run_id=5197532699295744&run_id=5471855615934464&run_id=6039909097799680&run_id=6317856329302016&run_id=5111343174647808&run_id=6301552633446400&run_id=5094247594196992&run_id=5736531297828864

Those results came from this build:
https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=11873

The string "setKeyframes.html" doesn't appear in any of the files wpt_report_1.json through wpt_report_4.json files, and it also doesn't appear in any of the logs.

The build for it was https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=11829 and there it appears in the logs of the first job:

2019-04-01T19:51:00.5468760Z   /web-animations/interfaces/KeyframeEffect/processing-a-keyframes-argument-002.html
2019-04-01T19:51:00.8011000Z   ▶ Unexpected subtest result in /web-animations/interfaces/KeyframeEffect/setKeyframes.html:
2019-04-01T19:51:00.8012280Z   │ FAIL [expected PASS] Changes made via setKeyframes should be immediately visible in style
2019-04-01T19:51:00.8013010Z   │   → assert_equals: expected "250px" but got "150px"
2019-04-01T19:51:00.8013730Z   │ 
2019-04-01T19:51:00.8014500Z   │ http://web-platform.test:8000/web-animations/interfaces/KeyframeEffect/setKeyframes.html:52:16
2019-04-01T19:51:00.8015390Z   │ step@http://web-platform.test:8000/resources/testharness.js:1587:30
2019-04-01T19:51:00.8016190Z   │ test@http://web-platform.test:8000/resources/testharness.js:544:34
2019-04-01T19:51:00.8017040Z   └ global code@http://web-platform.test:8000/web-animations/interfaces/KeyframeEffect/setKeyframes.html:43:5
2019-04-01T19:51:00.8017350Z 
2019-04-01T19:51:01.0567600Z   /web-animations/timing-model/animation-effects/phases-and-states.html

The first job of the bad build did take the usual amount of time, but it seems like it was incomplete, ending like this:

  ▶ TIMEOUT [expected OK] /payment-method-basic-card/steps_for_selecting_the_payment_handler.html

  ▶ CRASH [expected OK] /payment-request/payment-is-showing.https.html

  ▶ TIMEOUT [expected OK] /payment-request/payment-request-show-method.https.html

##[section]Finishing: Run tests

@gsnedders @jgraham do you know why it might happen that a run is incomplete like this without a non-zero exit code? This should be impossible?

@Hexcles this is a real-world problem that we could catch by vetting that results are complete in the monitoring we've discussed.

@gsnedders
Copy link
Member

@foolip I don't know, but there are plenty of errors/warnings that we don't treat as reasons to cause a non-zero exit code. Hard to tell what's going on without more verbose logs, though.

@youennf
Copy link
Contributor

youennf commented Jun 24, 2019

I am not sure whether this is the same issue but I also spotted the same issue with XHR tests.
For instance https://wpt.fyi/results/xhr?run_id=266850013&run_id=247030003&run_id=237070002&run_id=231300008
First two runs have something like 1200 tests, third run has 2000 runs.

@smfr
Copy link
Contributor

smfr commented Aug 14, 2019

Can we get more verbose logs? Can we look for Safari and WebContent crash logs?

@foolip
Copy link
Member Author

foolip commented Aug 15, 2019

@smfr the logs were made as verbose as currently possible in #17379. Should we use the --diagnose command line argument for safaridriver also? man safaridriver describes it as:

     When filing a bug report against safaridriver, it is highly recommended that you capture and
     include diagnostics generated by safaridriver. This can be accomplished in several ways:

     To diagnose a single session, pass the safari:diagnose capability when requesting a new session.

     To diagnose all sessions from one safaridriver instance, use the --diagnose command line option.

     To diagnose all sessions in all instances of safaridriver, set the DiagnosticsEnabled default in
     the com.apple.WebDriver domain to YES using defaults(1).

     Diagnostic files are saved to ~/Library/Logs/com.apple.WebDriver/ and are uniquely named using
     the pid of safaridriver and a timestamp.  When using the safari:diagnose capability to turn on
     diagnostics for a particular session, diagnostic files additionally include the session identi-
     fier in file names.

So it sounds like the logs won't show up in stdout/stderr, but we could upload these logs as extra artifacts in Azure Pipelines.

@foolip
Copy link
Member Author

foolip commented Aug 15, 2019

@smfr I've sent https://github.com/web-platform-tests/wpt/pulls but am not sure if it's adding useful logging or not.

@foolip
Copy link
Member Author

foolip commented Aug 15, 2019

Even without that extra logging, in https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=27211 which I triggered manually, there's a fair bit of information about the Safari Technology Preview problem in the logs. Logs for 1 and 3 both show "Max restarts exceeded" at the end, and preceding that is "SessionNotCreatedException: session not created (500): Could not create a session: The Safari instance is already paired with another WebDriver session." a number of times.

@smfr is that enough information to debug this, or would #18444 help here? I'll start a full run of Safari Technology Preview from that branch just to see what happens.

@foolip
Copy link
Member Author

foolip commented Aug 15, 2019

In #18444 those extra logs were collected. I see "Could not create a session: The Safari instance is already paired with another WebDriver session." in some of the logs, but unfortunately that was the error for attempting to create a new session, which promptly failed:

2019/08/15 11:48:36:801  HTTPServer: POST http://127.0.0.1:62635/session
2019/08/15 11:48:36:801  HTTPServer: {"capabilities": {"alwaysMatch": {"pageLoadStrategy": "eager"}}}
2019/08/15 11:48:36:806  HTTPServer: --> HTTP 500
2019/08/15 11:48:36:807  HTTPServer: --> {"value":{"error":"session not created","message":"Could not create a session: The Safari instance is already paired with another WebDriver session.","stacktrace":""}}

But... it looks like the logs of the safaridriver process right before that might have a clue. This is the end of it:

2019/08/15 11:59:15:893  HTTPServer: POST http://127.0.0.1:62472/session/C8DE6FE3-7867-4E5D-BDF4-005576DFC195/execute/async
2019/08/15 11:59:15:893  HTTPServer: {"args": [], "script": "// We have to set the url here to ensure we get the same escaping as in the harness\n// and also to handle the case where the test changes the fragment\nwindow.__wptrunner_url = \"/navigation-timing/nav2_test_document_replaced.html\";\nwindow.__wptrunner_testdriver_callback = arguments[arguments.length - 1];\nwindow.__wptrunner_process_next_event();\n"}
2019/08/15 11:59:15:893  AutomationProtocol: SEND {"method":"Automation.isShowingJavaScriptDialog","params":{"browsingContextHandle":"page-3B8E6E14-374F-42F4-A195-B06FE858EE86"},"id":3757}
2019/08/15 11:59:15:894  AutomationProtocol: RECV {"result":{"result":false},"id":3757}
2019/08/15 11:59:15:895  AutomationProtocol: SEND {"method":"Automation.evaluateJavaScriptFunction","params":{"arguments":[],"frameHandle":"","expectsImplicitCallbackArgument":true,"function":"function() {// We have to set the url here to ensure we get the same escaping as in the harness\n// and also to handle the case where the test changes the fragment\nwindow.__wptrunner_url = \"/navigation-timing/nav2_test_document_replaced.html\";\nwindow.__wptrunner_testdriver_callback = arguments[arguments.length - 1];\nwindow.__wptrunner_process_next_event();\n}","browsingContextHandle":"page-3B8E6E14-374F-42F4-A195-B06FE858EE86","callbackTimeout":15000},"id":3758}
2019/08/15 11:59:15:900  AutomationProtocol: RECV {"method":"Automation.browsingContextCleared","params":{"browsingContextHandle":"page-3B8E6E14-374F-42F4-A195-B06FE858EE86"}}
2019/08/15 11:59:15:964  AutomationProtocol: RECV {"error":{"code":-32000,"message":"JavaScriptError;Callback was not called before the unload event.","data":[{"code":-32000,"message":"JavaScriptError;Callback was not called before the unload event."}]},"id":3758}
2019/08/15 11:59:15:965  HTTPServer: --> HTTP 500
2019/08/15 11:59:15:965  HTTPServer: --> {"value":{"error":"javascript error","message":"A JavaScript exception occured: Callback was not called before the unload event.","stacktrace":""}}
2019/08/15 11:59:15:974  AutomationProtocol: RECV {"method":"Automation.browsingContextCleared","params":{"browsingContextHandle":"page-3B8E6E14-374F-42F4-A195-B06FE858EE86"}}

It looks like this was while running /navigation-timing/nav2_test_document_replaced.html. The failure can be seen on wpt.fyi for both Safari configurations:
https://wpt.fyi/results/navigation-timing/nav2_test_document_replaced.html?run_id=284610002&run_id=283080010

Running no_proxy='*' ./wpt run --yes --log-tbpl - --log-tbpl-level info --webdriver-arg=--diagnose --channel preview safari navigation-timing locally I see the same error but the tests finish executing. I'm running STP 89, however, so if the bug was with Safari it might have been fixed.

Apple folks, can you provide any detail on when we'd get "Could not create a session: The Safari instance is already paired with another WebDriver session."? Is it because of failing to end a session previously, too many concurrent sessions, or...?

@foolip foolip assigned gsnedders and unassigned foolip Aug 15, 2019
@foolip foolip changed the title Missing results from Safari Technology Preview run Missing results from Safari on Azure Pipelines Aug 16, 2019
@youennf
Copy link
Contributor

youennf commented Aug 16, 2019

@burg, any insight?

@foolip
Copy link
Member Author

foolip commented Aug 16, 2019

Some additional clues about this can be found by looking at the 8 runs from the most recent daily run:
https://wpt.fyi/results/?run_id=274980003&run_id=306180001&run_id=284750001&run_id=280910011&run_id=290810008&run_id=300370004&run_id=274960017&run_id=298270011

Looking at these individually (example) one sees something like "Showing 34232 tests (1644318 subtests)". Number of subtests will vary somewhat, but for all 6 runs of Chrome, Edge and Firefox the number of tests in 34232.

For both Safari runs, the number of tests is 30244. It's not always exactly the same, but looking back at the aligned runs for the previous day, it's off by 5 at most. This suggests the cause is the same for both Safari configurations and that it's not totally flaky.

One can easily tell from the Azure Pipelines logs how long each job got before it stopped running, and with some effort figure out which safaridriver --diagnose logs might be interesting to look at.

This issue is now assigned to @gsnedders, but all and any help from WebKit folks would be appreciated.

@burg
Copy link
Contributor

burg commented Aug 16, 2019

Apple folks, can you provide any detail on when we'd get "Could not create a session: The Safari instance is already paired with another WebDriver session."? Is it because of failing to end a session previously, too many concurrent sessions, or...?

Mac Safari can only pair to one safaridriver session at a time. The previous session is still paired, so a new one can't be established. If Delete Session was not called during teardown from the HTTP 500, then that is probably the cause.

The HTTP 500 error looks like a bug in safaridriver, it should be clearing out pending execution callbacks before navigating away the main frame.

@foolip
Copy link
Member Author

foolip commented Aug 19, 2019

Some more clues for debugging this via #18519 where I increased the number of jobs/chunks.

After downloading the logs for the "Run tests" step of each job from Azure Pipelines, search for 'Max restarts exceeded' to find the ones that failed early.

With both 4 and 8 jobs, there are two jobs that fail like this, and the immediately preceding test is the same for both 4 and 8 jobs:

  • /inert/inert-retargeting.tentative.html
  • /inert/inert-retargeting-iframe.tentative.html

Both are timeouts and after that we get "Could not create a session: The Safari instance is already paired with another WebDriver session." until "Max restarts exceeded".

Those tests use test_driver.Actions() and are probably causing some problem with safaridriver.

@foolip
Copy link
Member Author

foolip commented Aug 19, 2019

https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=27786 with the two test deleted seems to have worked: https://wpt.fyi/results/?run_id=288930005

There are 34274 tests, exactly two fewer than Firefox for the base commit.

That pretty much confirms the problem is with these two tests. @gsnedders is there a quick way to just skip running these tests to have full results while this is investigated? If the problem is with safaridriver, we might need to skip for a long time until the next Safari stable. Being able to recover from the situation would be better of course.

@foolip
Copy link
Member Author

foolip commented Aug 19, 2019

Those tests were introduced in #17223, on June 27. The next run after that was missing results compared to those before:
https://wpt.fyi/results/?run_id=247140011&run_id=237150017 (before)
https://wpt.fyi/results/?run_id=259230020&run_id=252940022 (after)

@lukebjerring note that this is going to mess with any stats for Safari results over time. When this issue has been fixed, can we label the runs as incomplete somehow?

@foolip
Copy link
Member Author

foolip commented Aug 23, 2019

#18585 did the trick as a workaround. Aligned runs with full Safari results (except the excluded tests) since it was merged:
https://wpt.fyi/results/?run_id=285080009&run_id=276830006&run_id=306320006&run_id=300580012&run_id=276850006&run_id=276860007

@foolip
Copy link
Member Author

foolip commented Aug 23, 2019

Filed #18634 and #18635 about the excluded tests.

@gsnedders
Copy link
Member

OK, with the debug info added in #18642, it's clearer what's happening. (Full logs here.)

This is running, with STP release 90:

./wpt run --log-mach - --log-mach-level debug --webdriver-binary /Applications/Safari\ Technology\ Preview.app/Contents/MacOS/safaridriver safari /inert/inert-in-shadow-dom.tentative.html /inert/inert-retargeting.tentative.html infrastructure/assumptions/html-elements.html 

Notable things in the log:

0:10.39 DEBUG new state: restarting
 0:10.39 DEBUG Dispatch restart_runner
 0:10.39 DEBUG Stopping WebDriver
 0:10.41 DEBUG ensure_runner_stopped
 0:10.41 DEBUG waiting for runner process to end
 0:10.41 DEBUG After join
 0:10.41 DEBUG Runner process exited with code 0
 0:10.41 DEBUG TestRunnerManager cleanup
 0:10.41 DEBUG Hanging up on WebDriver session
 0:10.41 DEBUG Traceback (most recent call last):
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/wptrunner/wptrunner/executors/executorwebdriver.py", line 236, in teardown
    self.webdriver.end()
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/client.py", line 444, in end
    self.send_command("DELETE", "session/%s" % self.session_id)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/client.py", line 471, in send_command
    session=self)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/transport.py", line 175, in send
    response = self._request(method, uri, payload, headers)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/transport.py", line 191, in _request
    return self.connection.getresponse()
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1121, in getresponse
    response.begin()
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 438, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 402, in _read_status
    raise BadStatusLine("No status line received - the server has closed the connection")
BadStatusLine: No status line received - the server has closed the connection

Note the "Stopping WebDriver" (i.e., terminating the safaridriver process) before we delete the session (actually in parallel), meaning we normally fail to delete the session cleanly. This is kinda ugly but doesn't cause any problems. (It's fundamentally the same race as we saw in #16118.)

0:31.59 TEST_END: TIMEOUT, expected OK - Waiting on browser:
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 774, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/wptrunner/wptrunner/executors/executorwebdriver.py", line 294, in _run
    self.result = True, self.func(self.protocol, self.url, self.timeout)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/wptrunner/wptrunner/executors/executorwebdriver.py", line 385, in do_testharness
    done, rv = handler(result)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/wptrunner/wptrunner/executors/base.py", line 605, in __call__
    return callback(url, payload)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/wptrunner/wptrunner/executors/base.py", line 619, in process_action
    result = action_handler(payload)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/wptrunner/wptrunner/executors/base.py", line 675, in __call__
    self.protocol.action_sequence.send_actions({"actions": actions})
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/wptrunner/wptrunner/executors/executorwebdriver.py", line 180, in send_actions
    self.webdriver.actions.perform(actions['actions'])
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/client.py", line 20, in inner
    return func(self, *args, **kwargs)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/client.py", line 225, in perform
    self.session.switch_frame(None)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/client.py", line 572, in switch_frame
    return self.send_session_command("POST", url, body)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/client.py", line 516, in send_session_command
    return self.send_command(method, url, body)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/client.py", line 471, in send_command
    session=self)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/transport.py", line 175, in send
    response = self._request(method, uri, payload, headers)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/transport.py", line 191, in _request
    return self.connection.getresponse()
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1121, in getresponse
    response.begin()
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 438, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 394, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 480, in readline
    data = self._sock.recv(self._rbufsize)

This is where things really go wrong:

We timeout in the middle of the call to "perform actions" (or rather, safaridriver never responds to the request). (Hi @burg!)

Now, because the WebDriver client always uses a single HTTP connection, we can't then send a Delete Session command:

 0:31.61 DEBUG Hanging up on WebDriver session
 0:31.61 DEBUG Traceback (most recent call last):
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/wptrunner/wptrunner/executors/executorwebdriver.py", line 236, in teardown
    self.webdriver.end()
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/client.py", line 444, in end
    self.send_command("DELETE", "session/%s" % self.session_id)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/client.py", line 471, in send_command
    session=self)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/transport.py", line 175, in send
    response = self._request(method, uri, payload, headers)
  File "/Users/gsnedders/Documents/other-projects/wpt/web-platform-tests/tools/webdriver/webdriver/transport.py", line 190, in _request
    self.connection.request(method, url, payload, headers)
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1042, in request
    self._send_request(method, url, body, headers)
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1076, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 924, in putrequest
    raise CannotSendRequest()
CannotSendRequest

We then kill safaridriver, and then start a new safaridriver, and that's when we get the "SessionNotCreatedException: session not created (500): Could not create a session: The Safari instance is already paired with another WebDriver session."

@burg
Copy link
Contributor

burg commented Aug 23, 2019

Hello again. For safaridriver, Delete Session and New Session are known to race, because Delete Session doesn't wait for Safari to tear down its session. This is usually not an issue.

If there's a bug in Perform Actions that causes the command to not return, then the recovery suggestion would be to kill Safari and safaridriver and start anew. I just merged a fix to one such hang that was reported by @foolip , not sure if this is related.

@foolip
Copy link
Member Author

foolip commented Aug 23, 2019

Does killing safaridriver also kill any Safari processes started by it? If not, how should one kill those without potentially killing other Safari instances that aren't under automation control?

@burg
Copy link
Contributor

burg commented Aug 23, 2019

@foolip No. safaridriver does not have a kill handler to do that. Nor does it work for Delete Session (though I plan to fix that).

In general, it is not supported to run >1 instance of /Applications/Safari.app at a time, so killing more than one by mistake is not something I have run into personally. An alternative would be to map an older and newer Safari by pid and kill the newer one spawned by safaridriver. But these are all workarounds for bugs; I want to make sure the actual bugs are being reported so I can fix them.

@foolip
Copy link
Member Author

foolip commented Aug 26, 2019

@burg certainly I want to see the underlying bug fixed as well, but I think having the restart logic be as robust as possible helps anyway. In this issue, we had lots of test results missing for almost two months which I hope we'll be able to avoid. Recovering from problems will of course also mask them to a high degree, but they will be in the logs for anyone who goes looking. If the wpt.fyi results for Safari are useful for WebKit engineers, maybe some sort of monitoring could be set up to notice this? @lukebjerring @Hexcles do you think it's possible to create such monitoring that isn't entirely drowned out by expected changes in test results?

@gsnedders are you turning this into bugs with repro steps for @burg?

@smfr
Copy link
Contributor

smfr commented Aug 26, 2019

WebKit engineers are going to be making more use of wpt.fyi results from now on, so some monitoring would be really useful!

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Aug 27, 2019
…-exclude, a=testonly

Automatic update from web-platform-tests
Skip problematic tests in Safari using --exclude (#18585)

Workaround for web-platform-tests/wpt#16229.

--

wpt-commits: 2c25b504c582b31efdd33ae429de78efe9558331
wpt-pr: 18585
@foolip
Copy link
Member Author

foolip commented Aug 27, 2019

@smfr that's great to hear! https://github.com/web-platform-tests/wpt/projects/4 has an overview of the necessary work to get Safari results into better shape.

@burg beyond that, do you think that triaging Safari-specific failures, including tests results missing only in Safari, would be enough to notice when there are safaridriver or Safari bugs hit while running the tests? My main concern would be that most differences in test results aren't because of such bugs, and so it wouldn't be worthwhile looking deeply at each problem, so that problem still go unnoticed. If we do #18444, is there something in the logs that would always indicate a problem that needs to be fixed? Other ideas?

@lukebjerring is it already possible to look for tests that are only missing in one browser? (Some missing subtests are expected, but at the test level it should not happen.)

xeonchen pushed a commit to xeonchen/gecko that referenced this issue Aug 27, 2019
…-exclude, a=testonly

Automatic update from web-platform-tests
Skip problematic tests in Safari using --exclude (#18585)

Workaround for web-platform-tests/wpt#16229.

--

wpt-commits: 2c25b504c582b31efdd33ae429de78efe9558331
wpt-pr: 18585
@gsnedders
Copy link
Member

@burg Can you check and see if ./wpt run --log-mach - --log-mach-level debug --webdriver-binary /Applications/Safari\ Technology\ Preview.app/Contents/MacOS/safaridriver --no-pause safari /inert/inert-retargeting.tentative.html times out with the fix you merged?

I don't think the delete/new session race should be too much of a problem (we might try restarting multiple times, but that's hardly the end of the world 🤷‍♀️).

I guess what we want to do is conditionally kill Safari in, uh, some case. Is there any better way to see if a given instance of Safari is already under automation than just trying to start safaridriver and create a session? And then there's the question of how to find the PID of the relevant Safari process for that driver… is there anything better than calling ps? There doesn't seem to be any public API that I can see which gives the full path of the command being run.

@lukebjerring
Copy link
Contributor

lukebjerring commented Aug 30, 2019

sorry for the slow reply. The following query will give you results that are only missing (unknown) in Safari.

three(status:!missing) safari:missing

https://wpt.fyi/results/?label=master&label=experimental&aligned=&q=count%3A3%28status%3A%21unknown%29%20safari%3Aunknown

@foolip
Copy link
Member Author

foolip commented Sep 2, 2019

Thanks @lukebjerring.

A bunch of those are going to be because of missing SharedWorker support in Safari and probably not all that interesting. Then there are cases like https://wpt.fyi/results/webauthn/createcredential-badargs-attestation.https.html?run_id=291470002&run_id=309980003&run_id=309980002&run_id=305010005 which also are because of something implemented everywhere except Safari, but where the failure mode isn't great. There, the helpers involved aren't wrapping test code correctly, and seemingly accidentally creating a single-page test in Safari. @zcorpan FYI.

Anyway, I think that as long as the number of tests (not subtests) is correct, then comparing the results will mostly reveal issues with the tests as opposed to the infrastructure, so this issue should focus on complete runs at the test level.

@psaavedra
Copy link
Contributor

@burg Can you check and see if ./wpt run --log-mach - --log-mach-level debug --webdriver-binary /Applications/Safari\ Technology\ Preview.app/Contents/MacOS/safaridriver --no-pause safari /inert/inert-retargeting.tentative.html times out with the fix you merged?

I don't think the delete/new session race should be too much of a problem (we might try restarting multiple times, but that's hardly the end of the world ).

I guess what we want to do is conditionally kill Safari in, uh, some case. Is there any better way to see if a given instance of Safari is already under automation than just trying to start safaridriver and create a session? And then there's the question of how to find the PID of the relevant Safari process for that driver… is there anything better than calling ps? There doesn't seem to be any public API that I can see which gives the full path of the command being run.

Hi @gsnedders, for the record, I have the same issue in #18595. After a strenuous debugging session I found the reason #18595 (comment) why the test suite became randomly stuck is because sometimes a SIGTERM signal is not enough to kill the WebDriver.

My workaround is psaavedra@0ba9340 to force a SIGKILL instead of the SIGTERM. Which works but it is a bit rude. Do you have any other better alternative for this?

@foolip
Copy link
Member Author

foolip commented Sep 11, 2019

@gsnedders is it fair to call this resolved now, or are there still cases where incomplete results might be produced instead of failing the build?

@gsnedders
Copy link
Member

@foolip If you want to consider the failing builds a new issue, I guess we can resolve this? Lots of relevant discussion in here though.

@foolip
Copy link
Member Author

foolip commented Sep 12, 2019

There are still missing results, but there are issued file for all of the things skipped by --exclude.

Before we close this, can you check for an aligned run that the total number of tests is precisely the same for each browser, apart from the excluded tests?

gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 4, 2019
…-exclude, a=testonly

Automatic update from web-platform-tests
Skip problematic tests in Safari using --exclude (#18585)

Workaround for web-platform-tests/wpt#16229.

--

wpt-commits: 2c25b504c582b31efdd33ae429de78efe9558331
wpt-pr: 18585

UltraBlame original commit: 73225a9e209478b617cfc3024ed0523425e3efce
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 4, 2019
…-exclude, a=testonly

Automatic update from web-platform-tests
Skip problematic tests in Safari using --exclude (#18585)

Workaround for web-platform-tests/wpt#16229.

--

wpt-commits: 2c25b504c582b31efdd33ae429de78efe9558331
wpt-pr: 18585

UltraBlame original commit: 73225a9e209478b617cfc3024ed0523425e3efce
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 4, 2019
…-exclude, a=testonly

Automatic update from web-platform-tests
Skip problematic tests in Safari using --exclude (#18585)

Workaround for web-platform-tests/wpt#16229.

--

wpt-commits: 2c25b504c582b31efdd33ae429de78efe9558331
wpt-pr: 18585

UltraBlame original commit: 73225a9e209478b617cfc3024ed0523425e3efce
@foolip
Copy link
Member Author

foolip commented Nov 19, 2019

I've confirmed that #18634 is all that remains.

These runs all have results from 36411 tests: https://wpt.fyi/results/?run_id=362500002 https://wpt.fyi/results/?run_id=364420003 https://wpt.fyi/results/?run_id=362520001 https://wpt.fyi/results/?run_id=352740005

And only the inert tests differ between Safari preview and failure: https://wpt.fyi/results/?run_id=350750015 https://wpt.fyi/results/?run_id=348890004

@foolip foolip closed this as completed Nov 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants