fix(workers): Don't panic when a worker's parent thread stops running #12156

andreubotella · 2021-09-20T18:18:11Z

This panic could happen in the following cases:

A non-fatal error being thrown from a worker, that doesn't terminate the worker's execution, but propagates to the main thread without being handled, and makes the main thread terminate.
A nested worker being alive while its parent worker gets terminated.
A race condition if the main event loop terminates the worker as part of its last task, but the worker doesn't fully terminate before the main event loop stops running.

This panic happens because a worker's event loop should have pending ops as long as the worker isn't closed or terminated – but if an event loop finishes running while it has living workers, its associated WorkerThread structs will be dropped, closing the channels that keep those ops pending.

This change adds a Drop implementation to WorkerThread, which terminates the worker without waiting for a response. This fixes the panic, and makes it so nested workers are automatically terminated once any of their ancestors is closed or terminated.

This change also refactors a worker's termination code into a WorkerThread::terminate() method.

Closes #11342.

andreubotella · 2021-09-21T12:11:37Z

As it turns out, terminating the worker when WebWorkerHandle gets dropped doesn't work, because WebWorkerHandle can be cloned without creating a new worker (which is used by the worker_host ops in order to keep access to the handle across await points). But other than those ops, the only thing that uses a WebWorkerHandle is WorkerThread, which does seem to map one-to-one with a worker. Making that struct, rather than WebWorkerHandle, terminate the worker on drop seems like the right choice.

~~(I'll be updating the title of this PR and the commit message once CI passes.)~~

This panic could happen in the following cases: - A non-fatal error being thrown from a worker, that doesn't terminate the worker's execution, but propagates to the main thread without being handled, and makes the main thread terminate. - A nested worker being alive while its parent worker gets terminated. - A race condition if the main event loop terminates the worker as part of its last task, but the worker doesn't fully terminate before the main event loop stops running. This panic happens because a worker's event loop should have pending ops as long as the worker isn't closed or terminated – but if an event loop finishes running while it has living workers, its associated `WorkerThread` structs will be dropped, closing the channels that keep those ops pending. This change adds a `Drop` implementation to `WorkerThread`, which terminates the worker without waiting for a response. This fixes the panic, and makes it so nested workers are automatically terminated once any of their ancestors is closed or terminated. This change also refactors a worker's termination code into a `WorkerThread::terminate()` method. Closes denoland#11342.

bartlomieju

LGTM, let's try this

bartlomieju · 2021-09-22T15:22:42Z

runtime/ops/worker_host.rs

+      .expect("Worker thread panicked")
+      .expect("Panic in worker event loop");


I'm not sure if we should be expect()ing here. This would cause another panic if something goes wrong in worker and might be prone to other race conditions.

Although that doesn't change current behavior so I guess it's fine for now

Before denoland#12156, closing a worker which had children would cause a panic (denoland#11342 (comment)). After that PR, closing a worker will also close any child workers.

…12215) Before #12156, closing a worker which had children would cause a panic (#11342 (comment)). After that PR, closing a worker will also close any child workers.

…denoland#12156) This panic could happen in the following cases: - A non-fatal error being thrown from a worker, that doesn't terminate the worker's execution, but propagates to the main thread without being handled, and makes the main thread terminate. - A nested worker being alive while its parent worker gets terminated. - A race condition if the main event loop terminates the worker as part of its last task, but the worker doesn't fully terminate before the main event loop stops running. This panic happens because a worker's event loop should have pending ops as long as the worker isn't closed or terminated – but if an event loop finishes running while it has living workers, its associated `WorkerThread` structs will be dropped, closing the channels that keep those ops pending. This change adds a `Drop` implementation to `WorkerThread`, which terminates the worker without waiting for a response. This fixes the panic, and makes it so nested workers are automatically terminated once any of their ancestors is closed or terminated. This change also refactors a worker's termination code into a `WorkerThread::terminate()` method. Closes denoland#11342. Co-authored-by: Bartek Iwańczuk <[email protected]>

andreubotella force-pushed the worker-drop-handle-race branch from 6cd79db to e20dd13 Compare September 21, 2021 13:46

andreubotella changed the title ~~fix(workers): Dropping a WebWorkerHandle should terminate the worker~~ fix(workers): Don't panic when a worker's parent thread stops running Sep 21, 2021

andreubotella mentioned this pull request Sep 22, 2021

integration::worker::workers is flaky #12075

Closed

Merge branch 'main' into worker-drop-handle-race

8891133

bartlomieju approved these changes Sep 22, 2021

View reviewed changes

bartlomieju mentioned this pull request Sep 22, 2021

Mark integration::worker::workers as flaky #12182

Closed

bartlomieju merged commit 5c5f4ea into denoland:main Sep 22, 2021

andreubotella deleted the worker-drop-handle-race branch September 22, 2021 16:02

andreubotella mentioned this pull request Sep 24, 2021

chore(workers): Test that closing a worker closes any child workers #12215

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workers): Don't panic when a worker's parent thread stops running #12156

fix(workers): Don't panic when a worker's parent thread stops running #12156

andreubotella commented Sep 20, 2021 •

edited

Loading

andreubotella commented Sep 21, 2021 •

edited

Loading

bartlomieju left a comment

bartlomieju Sep 22, 2021

bartlomieju Sep 22, 2021

		.expect("Worker thread panicked")
		.expect("Panic in worker event loop");

fix(workers): Don't panic when a worker's parent thread stops running #12156

fix(workers): Don't panic when a worker's parent thread stops running #12156

Conversation

andreubotella commented Sep 20, 2021 • edited Loading

andreubotella commented Sep 21, 2021 • edited Loading

bartlomieju left a comment

Choose a reason for hiding this comment

bartlomieju Sep 22, 2021

Choose a reason for hiding this comment

bartlomieju Sep 22, 2021

Choose a reason for hiding this comment

andreubotella commented Sep 20, 2021 •

edited

Loading

andreubotella commented Sep 21, 2021 •

edited

Loading