Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions thrown after a worker's initial execution panic #11342

Closed
andreubotella opened this issue Jul 9, 2021 · 5 comments · Fixed by #12156
Closed

Exceptions thrown after a worker's initial execution panic #11342

andreubotella opened this issue Jul 9, 2021 · 5 comments · Fixed by #12156
Labels
bug Something isn't working correctly runtime Relates to code in the runtime crate

Comments

@andreubotella
Copy link
Contributor

andreubotella commented Jul 9, 2021

error: Uncaught (in worker "") Error
    throw new Error();
          ^
    at file:///home/abotella/test/worker.js:3:11
    at fire (deno:extensions/timers/01_timers.js:477:7)
    at handleTimerMacrotask (deno:extensions/timers/01_timers.js:338:7)
thread 'worker-0' panicked at 'coding error: either js is polling or the worker is terminated', runtime/web_worker.rs:498:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: Uncaught (in promise) Error: Unhandled error event reached main worker.
    at Worker.#pollControl (deno:runtime/js/11_workers.js:247:23)

test.js

new Worker(
  new URL("./worker.js", import.meta.url).href,
  { type: "module" },
);

worker.js

setTimeout(() => {
  throw new Error();
}, 1000);

This seems to have been introduced in #11076, and it seems to be related to some of the WPT test failures in #11338.

@andreubotella
Copy link
Contributor Author

andreubotella commented Sep 4, 2021

Throwing that error on the worker is not a terminal error, and so it won't close or terminate the worker. However, when the error is propagated to the main thread, the main thread will terminate. This panic seems to be caused because, when MainWorker is dropped after an error in cli's run_command, the worker's WebWorkerHandle is also dropped. This closes the sender before the worker terminates, which is something the worker's event loop doesn't expect to happen. Whether the panic occurs or not is of course spurious, since the process is about to close anyway, but it seems like in most cases it does occur.

@andreubotella
Copy link
Contributor Author

andreubotella commented Sep 13, 2021

The same panic happens when closing a worker which has still-running child workers. The WebWorkerHandle corresponding to the child worker is dropped because the parent worker is terminating, while the child worker is still alive.

Both Chromium and Firefox (WebKit doesn't support nested workers) will terminate a nested worker when its parent worker is closed or terminated. It's hard to say whether the spec allows this, though – the fact that errors are propagated up the chain of workers, even if some of those are terminated seems to imply that a terminated worker might have living descendants.

@andreubotella
Copy link
Contributor Author

As reported by @kuuote in #12068, this panic also happens when throwing an error in the initial execution of a nested worker.

andreubotella pushed a commit to andreubotella/deno that referenced this issue Sep 20, 2021
This was causing a panic when the `WebWorkerHandle` was dropped before
the worker's event loop stopped running. This could happen due to a race
condition, due to a non-fatal error in the worker propagating to the
main thread and closing the process, or due to a worker being terminated
while it still had living worker children.

Closes denoland#11342.
andreubotella pushed a commit to andreubotella/deno that referenced this issue Sep 21, 2021
This panic could happen in the following cases:

- A non-fatal error being thrown from a worker, that doesn't terminate
  the worker's execution, but propagates to the main thread without
  being handled, and makes the main thread terminate.
- A nested worker being alive while its parent worker gets terminated.
- A race condition if the main event loop terminates the worker as part
  of its last task, but the worker doesn't fully terminate before the
  main event loop stops running.

This panic happens because a worker's event loop should have pending ops
as long as the worker isn't closed or terminated – but if an event loop
finishes running while it has living workers, its associated
`WorkerThread` structs will be dropped, closing the channels that keep
those ops pending.

This change adds a `Drop` implementation to `WorkerThread`, which
terminates the worker without waiting for a response. This fixes the
panic, and makes it so nested workers are automatically terminated once
any of their ancestors is closed or terminated.

This change also refactors a worker's termination code into a
`WorkerThread::terminate()` method.

Closes denoland#11342.
bartlomieju added a commit that referenced this issue Sep 22, 2021
…#12156)

This panic could happen in the following cases:

- A non-fatal error being thrown from a worker, that doesn't terminate
  the worker's execution, but propagates to the main thread without
  being handled, and makes the main thread terminate.
- A nested worker being alive while its parent worker gets terminated.
- A race condition if the main event loop terminates the worker as part
  of its last task, but the worker doesn't fully terminate before the
  main event loop stops running.

This panic happens because a worker's event loop should have pending ops
as long as the worker isn't closed or terminated – but if an event loop
finishes running while it has living workers, its associated
`WorkerThread` structs will be dropped, closing the channels that keep
those ops pending.

This change adds a `Drop` implementation to `WorkerThread`, which
terminates the worker without waiting for a response. This fixes the
panic, and makes it so nested workers are automatically terminated once
any of their ancestors is closed or terminated.

This change also refactors a worker's termination code into a
`WorkerThread::terminate()` method.

Closes #11342.

Co-authored-by: Bartek Iwańczuk <[email protected]>
andreubotella pushed a commit to andreubotella/deno that referenced this issue Sep 24, 2021
Before denoland#12156, closing a worker which had children would cause a panic
(denoland#11342 (comment)).
After that PR, closing a worker will also close any child workers.
piscisaureus pushed a commit that referenced this issue Sep 25, 2021
…12215)

Before #12156, closing a worker which had children would cause a panic
(#11342 (comment)).

After that PR, closing a worker will also close any child workers.
caspervonb pushed a commit to caspervonb/deno that referenced this issue Sep 26, 2021
…denoland#12156)

This panic could happen in the following cases:

- A non-fatal error being thrown from a worker, that doesn't terminate
  the worker's execution, but propagates to the main thread without
  being handled, and makes the main thread terminate.
- A nested worker being alive while its parent worker gets terminated.
- A race condition if the main event loop terminates the worker as part
  of its last task, but the worker doesn't fully terminate before the
  main event loop stops running.

This panic happens because a worker's event loop should have pending ops
as long as the worker isn't closed or terminated – but if an event loop
finishes running while it has living workers, its associated
`WorkerThread` structs will be dropped, closing the channels that keep
those ops pending.

This change adds a `Drop` implementation to `WorkerThread`, which
terminates the worker without waiting for a response. This fixes the
panic, and makes it so nested workers are automatically terminated once
any of their ancestors is closed or terminated.

This change also refactors a worker's termination code into a
`WorkerThread::terminate()` method.

Closes denoland#11342.

Co-authored-by: Bartek Iwańczuk <[email protected]>
@BartTactick
Copy link

BartTactick commented Feb 4, 2022

Hi there,

I have this error comming up after upgrading from 1.13 to 1.18.
I dont have a reproduction case but this is basicly what happends.

At my project we have functions (just a string of code) thats being passed down to one of the workers.
There are always 20 workers in total.
Then when a worker recieves the code, it executes it with eval (its auth protected, i know it can be unsafe).
After the worker has returned a function result the worker will be terminated.
Then a new worker is created to make sure the amound of workers is still 20.

It goes wrong when nothing happends for a hour upto 2 hours
If one of the workers has to do something after that, the following error is thrown: (does not always happen but like 80% of the time).
image
Also since the error is about multiple workers, i guess only this part and not any other part of the application.

@lucacasonato lucacasonato reopened this Feb 4, 2022
@mmastrac
Copy link
Contributor

Re-closing this issue as the event loop has been changed pretty heavily, and a long-standing bug with promise IDs rolling over has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly runtime Relates to code in the runtime crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants