Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[browser] WASM deputy worker - multi-threading proposal #91696

Closed
wants to merge 11 commits into from
Closed
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
231 changes: 231 additions & 0 deletions docs/design/mono/wasm-threads.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
# Multi-threading on browser

## Goals
- JS interop
- both sync and async calls and callbacks
- sync calls from JS to C# are part of the problem, see below
- from UI thread to "some C# main thread" and back
- from dedicated worker to it's JavaScript state and back
- CPU intensive workloads on dotnet thread pool
- enable blocking .Wait APIs from C# user code
- Current public API throws PNSE for it
- allow HTTP and WS C# APIs to be used from any thread
- Underlying JS object have thread affinity
- don't change/break single threaded build. †
- don't try to block on UI thread.

<sub><sup>† Note: all the text below discusses MT build only, unless explicit about ST build.</sup></sub>

## Problem
1. If you have multithreading, any thread might need to block while waiting for any other to release a lock.
- locks are in the user code, in nuget packages, in Mono VM itself
- there are managed and un-managed locks
- in single-threaded build of the runtime, all of this is NOOP. That's why it works on UI thread.
2. UI thread in the browser can't synchronously block
- you can spin-lock but it's bad idea.
- It eats your battery
- Browser will kill your tab at random point (Aw, snap).
pavelsavara marked this conversation as resolved.
Show resolved Hide resolved
- It's not deterministic and you can't really test your app to prove it harmless.
- all the other threads/workers could synchronously block
3. JavaScript engine APIs and objects have thread affinity. The DOM and few other browser APIs are only available on the main UI "thread"
- and so, you need to have C# interop with UI, but you can't block there.

## Design proposal TL;DR
4. execute C# code on "deputy worker", executing all user C# on behalf of the UI JavaScript
5. throw PNSE when UI JavaScript would call in any synchronous JSExport or callback to C#. ††
pavelsavara marked this conversation as resolved.
Show resolved Hide resolved

<sub><sup>†† This will prevent user C# code from trying to randomly synchronously block the caller on UI thread.</sup></sub>

## Alternatives
10. create emscripten engine on worker
- this is similar to 4. but not feasible
- it would break lot of existing JavaScript APIs
- it would make startup callbacks on wrong thread (blazor JS integration)
- we would have to re-write Blazor's `renderBatch` to bytes streaming.
11. throw PNSE any time C# code needs to block
- It's not deterministic and you can't really test your app to prove it harmless.
pavelsavara marked this conversation as resolved.
Show resolved Hide resolved
12. throw PNSE any time C# code or VM code needs needs to block
- Mono VM needs to hold lock while allocating memory, even on the UI thread.
13. modify `ConfigureAwait()`, work queue etc, to never dispatch to another thread.
- this would probably break user code expectations about dynamic behavior of tasks and how they run in parallel with each other.

# Design proposal details

## UI thread
- this is the main browser "thread", the one with DOM on it
- will start emscripten as usual
- this includes C# which runs during mono startup
- will create Deputy worker as C# thread
- dispatch execution of C# `Task Main()` to the deputy worker.
- dispatch all async JSImport calls to the deputy worker.
- dispatch all async callbacks to the deputy worker.
- throw PNSE on any JSImport call from JS
- it will be valid C# thread, but not used directly by user code.
- we will try to prevent user code from running on it and from needing to do so
- we will spin lock only for Mono VM code
- we assume that Mono VM will block only shortly for operations like:
- alloc/free memory
- transform IL -> IR and update Mono VM shared state
- we will spin lock before Blazor `renderBatch`
- to wait for "pause GC"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we can't pause the GC because the deputy worker is stuck? I guess the browser thread hangs, right?

- we will spin lock during GC, if we hit barrier
- TODO: is that short enough ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCs can take hundreds of MS in the real world (I'm not sure why, but our GC under WASM seems to be much slower than it would be on native) so we should assume that users on lower spec devices will see hangs if we have to do this. It might still be the right choice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other alternative is to rewrite renderBatch to streaming bytes.

- we should never block for file operations or for network operations
- TODO: how to prove it ?
- Could we unregister the thread from Mono VM ? No, because we need the C# to dispatch calls in both directions in
- it will actually execute small chunks of C# code
- the pieces which are in the generated code of the JSExport
- containing dispatch to deputy worker's synchronization context
Comment on lines +79 to +81
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this dispatch need to be done in C#? why can't it be done in JS?

More generally I don't understand why running emscripten off the main thread is a non a workable solution. It has a much more easily understandable execution model, in my opinion - it's just like talking to a server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS interop is creating JS objects and that needs to happen on the right JS worker, because of affinity.
The JS side of the marshaller is calling static C# methods, to create C# instances of Task.

To the second question, that's alternative 10. The main reason is that JS interop and also the JS embedding APIs actually needs to make those short-lived synchronous calls to Mono/C#. Otherwise even JS code of marshaling individual arguments would have to be async, because it would send messages to a "server".

The Blazor renderBatch is also touching memory and converting MonoString* etc. That could not become async, perf would just die.

It could be done, but it would not be less trouble than having C# on the UI thread internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is more developed version of this "why not aternative 10." lower in the conversation. I will create separate PR/proposal for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #91731

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally I don't understand why running emscripten off the main thread is a non a workable solution.

@lambdageek here is more detailed answer #91731 (comment)

Copy link
Member Author

@pavelsavara pavelsavara Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer: because marshaling JSImport/JSExport parameters requires lot on Mono infrastructure. Like allocating MonoString, TaskCompletionSource and many others. Those are mostly synchronous calls and need to be fast. Trying to dispatch those calls from UI to sidecar thread would lead to terrible latency, per parameter. Alternatively we could have double proxy for some of those data types, which is not great either, mostly for GC.

- could sync void C# methods be dispatched as fire and forget ?
- no, because that would break the contract that they are blocking until finished.
- also the errors would not propagate
- TODO: is there anything special about error propagation over the interop/thread boundary ?

## Deputy worker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a well-known "JSWebWorker with JSInterop" thread that the ui-thread JS interop knows about? If not, are the different capabilities of this special thread and other JSWebWorker with JS Interop threads required?

Wait, I read more and I thikn I can answer myself. It's just like a JSWebWorker with JSInterop thread except instead of owning "real" JS Interop it owns proxies that represent JS Interop objects of the UI thread instead. Is that right?

So we basically have special "JS event loop" threads and they come in two flavors: 1. they interact with their own js objects, or 2. they just pass messages to some other delegated js object realm?

Copy link
Member Author

@pavelsavara pavelsavara Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, deputy is similar to JSWebWorker except it talks to UI thread JS space, not it's own (on logical level).
The implementation of how it works is not JS event loop, but rather C# SynchronizationContext.

Passing those messages could be also done via emscripten's emscripten_dispatch_to_thread_async or by JS postMessage.

Doing it in C# allows us to interleave it with the code generated by Roslyn codegen.
Let's do more thinking about this, how to actually do it.

One of the open questions for me is selection of target thread based on affinity of the JSObject proxy passed as argument. Note we are talking about static methods which could have more or also none such object.
This question is relevant for "call HTTP from any C# thread" scenario.

- this is new concept introduced here. I needed some name for it 🤷‍♂️
- executing all **user C# code** on behalf of the UI JavaScript "thread"
- that is also C# entry point `Task Main()` or `void Main()`
- `void Main()` would return promise that never resolves to UI JavaScript
- this thread **could block** on synchronization primitives just fine!
- doesn't expose JavaScript state to user code.
- because JSHandle has thread affinity and it's unique per JS thread.
- as optimization we could consider running HTTP and WS client here, instead of UI thread. But JSHandle problem.
- has SynchronizationContext installed on it
- So that C# calls could be dispatched to it by runtime
- throw PNSE on attempt to marshal sync C# delegate to UI JavaScript
pavelsavara marked this conversation as resolved.
Show resolved Hide resolved
- can run C# finalizers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finalizers already run on a separate thread, not the main thread, AFAIK. the finalizer thread is owned by the GC, I think?

Copy link
Member Author

@pavelsavara pavelsavara Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't run finalizers in signe-threaded WASM right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in single-threaded wasm finalizers run in mono_runtime_do_background_work which is scheduled as a background job when mono_wasm_gc_finalize_notify is called. In other words after a GC is finished we queue up an idle task to run finalizers.

- will run GC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we somehow guarantee that GC will never happen on the browser thread? The browser thread will potentially have to stall and wait for GC at the very least, if we ever allow C# to run on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's GC.TryStartNoGCRegion() (no-op on Mono, I think). It's a weird global mode (so if you have background threads that are quickly allocating, your critical region can still run out of memory), but it might be good enough for some scenarios.

- this cross-threading dispatch will have performance impact for the JS interop.
- TODO: measure how much
- this should not impact Blazor `renderBatch` perf.

## JSWebWorker with JS interop
- is C# thread created and disposed by new API for it
- could block on synchronization primitives
- there is JSSynchronizationContext installed on it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI since you mentioned performance, be aware that (at present) having any kind of custom synchronization context installed causes some of the Task/Await machinery to go through a slow-path. see #69409 (comment) (though I hope this won't actually matter)

- so that user code could dispatch back to it, in case that it needs to call JSObject proxy (with thread affinity)

## C# Thread
- could block on synchronization primitives
- without JS interop. calling JSImport will PNSE.

## C# Threadpool Thread
- could block on synchronization primitives
- without JS interop. calling JSImport will PNSE.

## JSImport and marshaled JS functions
- both sync and async could be called on all threads
- sync: when called from C# it will use `SynchronizationContext.Send` and block caller.
- async: when called from C# it will use `SynchronizationContext.Post` and marshal promise immediately.
- when this is worker -> worker, `SynchronizationContext` should invoke it inline

## JSExport & C# delegates
- sync: will throw PNSE if called from UI JavaScript
- sync: will just work when called from JSWebWorker JavaScript
- async JSExport: will work on all threads. Will marshal promise and return immediately.
- async Delegate: are there any async callback possible yet ? The code gen doesn't support it yet in Net8.
- `getAssemblyExports` need to bind JS on UI thread, but register on deputy thread
- hide `SynchronizationContext.Send` and `SynchronizationContext.Post` inside of the generated code.
- fast on worker -> worker

## Promise
- passing promise should work everywhere.
- from UI javaScript it would be passed as Task to deputy worker
- open question: passing JS promise to deputy should be fine. But does the `resolve()` need to block UI thread ?

## Task, Task<T>
pavelsavara marked this conversation as resolved.
Show resolved Hide resolved
- passing Task should work everywhere.
- when marshaled to JS they bind to specific Promise and have affinity
- the `SetResult` need to be marshaled on thread of the Promise.
- The proxy of the Promise knows which `SynchronizationContext` to dispatch to.
- on UI thread it's the UI thread's SynchronizationContext, not deputy's
- TODO: could same task be marshaled to multiple JS workers ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should assume it can and will happen


## JSObject proxy
- has thread affinity, marked by private ThreadId.
- in deputy worker, it will be always UI thread Id
- the JSHandle always belongs to UI thread
- `Dispose` need to be called on the right thread.
- how to do that during GC/finalizer ?
- should we run finalizer per worker ?
- is it ok for `SynchronizationContext` to be public API
- because it could contain UI thread SynchronizationContext, which user code should not be disposed on.

## JSHost.GlobalThis, JSHost.DotnetInstance, JSHost.ImportAsync
- calls will be dispatched from deputy thread to UI JavaScript
pavelsavara marked this conversation as resolved.
Show resolved Hide resolved
- on JSWebWorker call will stay on the same thread.

## SynchronizationContext
- we will need public C# API for it, `JSHost.xxxSynchronizationContext`
- we could avoid it by generating late bound ICall. Very ugly.
pavelsavara marked this conversation as resolved.
Show resolved Hide resolved
- hide `SynchronizationContext.Send` and `SynchronizationContext.Post` inside of the generated code.
- needs to be also inside generated nested marshalers
- is solution for deputy's SynchronizationContext same as for JSWebWorker's SynchronizationContext, from the code-gen perspective ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the answer to be "yes"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope so too. Over night I realized that shape of this API could be 2 methods on the JSHost rather than exposing the SynchronizationContext, because there could more complex logic on which of the should be used.

- how could "HTTP from any C# thread" redirect this to the thread of fetch JS object affinity ?
- should generated code or the SynchronizationContext detect it from passed arguments ?
- TODO: explain why not make user responsible for doing it, instaed of changing generator
- TODO: figure out backward compatibility of already generated code. Must work on single threaded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems hard

- on a JSWebWorker
- to dispatch any calls of JSObject proxy members
- to dispatch `Dispose()` of JSObject proxy
- to dispatch `TaskCompletionSource.SetResult` etc
- on the UI thread
- same as above
- as alternative we could only have there emscripten C dispatcher
- it will need some public API any way, to be called from generated code.
- on the deputy thread
- to dispatch async calls from UI thread to it

## Blazor - what breaks when MT build
- as compared to single threaded runtime, the major difference would be no synchronous callbacks.
- for example from DOM `onClick`. This is one of the reasons people prefer ST WASM over Blazor Server.
- but there is really [no way around it](#problem), because you can't have both MT and sync calls from UI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could allow it but tell users not to use it. This is similar ot guidance in other UI framworks not to block UI threads. They don't prohibit you from running code but then you're on the hook for your UI freezing up (or in this case draining the battery and getting killed by the mobile browser)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have MSBuild property for it, which would be opt-in. Right now, I think if we are doing all of this complex stuff to prevent it, we should gain the benefits of it. Otherwise we will have to support crazy things.

I though that we could get away with spin block. But MT unit tests on CI are hanging in too many cases, and probable for this reason. We just need to get over it, me included.

- implement Blazor's `WebAssemblyDispatcher` to dispatch [`Component.InvokeAsync`](https://learn.microsoft.com/en-us/dotnet/api/microsoft.aspnetcore.components.componentbase.invokeasync) to deputy thread.
- process feedback from https://github.com/dotnet/aspnetcore/pull/48991 and make more async
- Blazor renderBatch will continue working even with legacy interop in place.
- Because it only reads memory and it doesn't call back to Mono VM.
- throw PNSE from Blazor's [`IJSInProcessRuntime.Invoke`](https://learn.microsoft.com/en-us/dotnet/api/microsoft.jsinterop.ijsinprocessruntime.invoke)
- throw PNSE from Blazor's any call to [`IJSUnmarshalledRuntime `](https://learn.microsoft.com/en-us/dotnet/api/microsoft.jsinterop.ijsunmarshalledruntime)

# Current state 2023 Sep
- we already ship MT version of the runtime in the wasm-tools workload.
- It's enabled by `<WasmEnableThreads>true</WasmEnableThreads>` and it requires COOP HTTP headers.
- It will serve extra file `dotnet.native.worker.js`.
- This will also start in Blazor project, but UI rendering would not work.
- we have pre-allocated pool of browser Web Workers which are mapped to pthread dynamically.
- we can configure pthread to keep running after synchronous thread_main finished. That's necessary to run any async tasks involving JavaScript interop.
- GC is running on UI thread/worker.
- legacy interop has problems with GC boundaries.
- JSImport & JSExport work
- There is private JSSynchronizationContext implementation which is too synchronous
- There is draft of public C# API for creating JSWebWorker with JS interop. It must be dedicated un-managed resource, because we could not cleanup JS state created by user code.
- There is MT version of HTTP & WS clients, which could be called from any thread but it's also too synchronous implementation.
- Many unit tests fail on MT https://github.com/dotnet/runtime/pull/91536
- there are MT C# ref assemblies, which don't throw PNSE for MT build of the runtime for blocking APIs.

## Task breakdown
- [ ] rename `WebWorker` API to `JSWebWorker` ?
- [ ] design details of JSImport binding, allocation, asynchrony
- [ ] design details of JSExport binding, allocation, asynchrony
- [ ] `ToManaged(out Task)` to be called before the actual JS method
- [ ] public API for `JSHost.<Target>SynchronizationContext` which could be used by code generator.
- [ ] change the code gen for JSImport
- [ ] change the code gen for JSExport
- [ ] reimplement `JSSynchronizationContext` to be more async
- [ ] implement Blazor's `WebAssemblyDispatcher`
- [ ] reimplement HTTP and WS with the new code gen without direct SynchronizationContext use
- [ ] there is synchronous callback from JS event to C# in HTTP code.
- [ ] make C# finalizers work
- [ ] throw PNSE - fail fast, so that users discover limits in the dev loop
- [ ] on any MT use of `mono_bind_static_method` from legacy interop.
- Because it's synchronous. It throws on JSWebWorker already.
- [ ] on UI synchronous JSImport
- [ ] on UI synchronous C# delegate callback
- [ ] throw fatal if somehow C# code was blocking on UI thread.
- [ ] optinal: make underlying emscripten WebWorker pool allocation dynamic
pavelsavara marked this conversation as resolved.
Show resolved Hide resolved
- [ ] optinal: implement async function/delegate marshaling in JSImport/JSExport parameters.
- [ ] optinal: enable blocking HTTP/WS APIs
- [ ] optinal: enable lazy DLL download by blocking the caller
- [ ] measure perf impact

Related Net8 tracking https://github.com/dotnet/runtime/issues/85592
Loading