Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[browser] WASM deputy worker - multi-threading proposal #91696

Closed
wants to merge 11 commits into from

Conversation

pavelsavara
Copy link
Member

@pavelsavara pavelsavara commented Sep 6, 2023

This is proposal on how to enable threads on WASM, while allowing for blocking .Wait, lock() { ... } and similar from user C# code.

Feedback and questions are welcome.

Contributes to #85592

There is alternative proposal #91731

@pavelsavara pavelsavara added arch-wasm WebAssembly architecture area-VM-threading-mono os-browser Browser variant of arch-wasm labels Sep 6, 2023
@pavelsavara pavelsavara added this to the 9.0.0 milestone Sep 6, 2023
@pavelsavara pavelsavara self-assigned this Sep 6, 2023
@ghost
Copy link

ghost commented Sep 6, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This is proposal on how to enable threads on WASM, while allowing for blocking .Wait from user C# code.

Feedback and questions are welcome.

Contributes to #85592

Author: pavelsavara
Assignees: pavelsavara
Labels:

arch-wasm, area-VM-threading-mono, os-browser

Milestone: 9.0.0

Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice writeup. I'm a little bit worried the model with the deputy thread will be difficult to explain.

Comment on lines +76 to +78
- it will actually execute small chunks of C# code
- the pieces which are in the generated code of the JSExport
- containing dispatch to deputy worker's synchronization context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this dispatch need to be done in C#? why can't it be done in JS?

More generally I don't understand why running emscripten off the main thread is a non a workable solution. It has a much more easily understandable execution model, in my opinion - it's just like talking to a server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS interop is creating JS objects and that needs to happen on the right JS worker, because of affinity.
The JS side of the marshaller is calling static C# methods, to create C# instances of Task.

To the second question, that's alternative 10. The main reason is that JS interop and also the JS embedding APIs actually needs to make those short-lived synchronous calls to Mono/C#. Otherwise even JS code of marshaling individual arguments would have to be async, because it would send messages to a "server".

The Blazor renderBatch is also touching memory and converting MonoString* etc. That could not become async, perf would just die.

It could be done, but it would not be less trouble than having C# on the UI thread internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is more developed version of this "why not aternative 10." lower in the conversation. I will create separate PR/proposal for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #91731

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally I don't understand why running emscripten off the main thread is a non a workable solution.

@lambdageek here is more detailed answer #91731 (comment)

Copy link
Member Author

@pavelsavara pavelsavara Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer: because marshaling JSImport/JSExport parameters requires lot on Mono infrastructure. Like allocating MonoString, TaskCompletionSource and many others. Those are mostly synchronous calls and need to be fast. Trying to dispatch those calls from UI to sidecar thread would lead to terrible latency, per parameter. Alternatively we could have double proxy for some of those data types, which is not great either, mostly for GC.

docs/design/mono/wasm-threads.md Outdated Show resolved Hide resolved
- also the errors would not propagate
- TODO: is there anything special about error propagation over the interop/thread boundary ?

## Deputy worker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a well-known "JSWebWorker with JSInterop" thread that the ui-thread JS interop knows about? If not, are the different capabilities of this special thread and other JSWebWorker with JS Interop threads required?

Wait, I read more and I thikn I can answer myself. It's just like a JSWebWorker with JSInterop thread except instead of owning "real" JS Interop it owns proxies that represent JS Interop objects of the UI thread instead. Is that right?

So we basically have special "JS event loop" threads and they come in two flavors: 1. they interact with their own js objects, or 2. they just pass messages to some other delegated js object realm?

Copy link
Member Author

@pavelsavara pavelsavara Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, deputy is similar to JSWebWorker except it talks to UI thread JS space, not it's own (on logical level).
The implementation of how it works is not JS event loop, but rather C# SynchronizationContext.

Passing those messages could be also done via emscripten's emscripten_dispatch_to_thread_async or by JS postMessage.

Doing it in C# allows us to interleave it with the code generated by Roslyn codegen.
Let's do more thinking about this, how to actually do it.

One of the open questions for me is selection of target thread based on affinity of the JSObject proxy passed as argument. Note we are talking about static methods which could have more or also none such object.
This question is relevant for "call HTTP from any C# thread" scenario.

docs/design/mono/wasm-threads.md Outdated Show resolved Hide resolved
- we could avoid it by generating late bound ICall. Very ugly.
- hide `SynchronizationContext.Send` and `SynchronizationContext.Post` inside of the generated code.
- needs to be also inside generated nested marshalers
- is solution for deputy's SynchronizationContext same as for JSWebWorker's SynchronizationContext, from the code-gen perspective ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the answer to be "yes"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope so too. Over night I realized that shape of this API could be 2 methods on the JSHost rather than exposing the SynchronizationContext, because there could more complex logic on which of the should be used.

- how could "HTTP from any C# thread" redirect this to the thread of fetch JS object affinity ?
- should generated code or the SynchronizationContext detect it from passed arguments ?
- TODO: explain why not make user responsible for doing it, instaed of changing generator
- TODO: figure out backward compatibility of already generated code. Must work on single threaded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems hard

## Blazor - what breaks when MT build
- as compared to single threaded runtime, the major difference would be no synchronous callbacks.
- for example from DOM `onClick`. This is one of the reasons people prefer ST WASM over Blazor Server.
- but there is really [no way around it](#problem), because you can't have both MT and sync calls from UI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could allow it but tell users not to use it. This is similar ot guidance in other UI framworks not to block UI threads. They don't prohibit you from running code but then you're on the hook for your UI freezing up (or in this case draining the battery and getting killed by the mobile browser)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have MSBuild property for it, which would be opt-in. Right now, I think if we are doing all of this complex stuff to prevent it, we should gain the benefits of it. Otherwise we will have to support crazy things.

I though that we could get away with spin block. But MT unit tests on CI are hanging in too many cases, and probable for this reason. We just need to get over it, me included.

docs/design/mono/wasm-threads.md Outdated Show resolved Hide resolved
@pavelsavara
Copy link
Member Author

I'm a little bit worried the model with the deputy thread will be difficult to explain.

Yes, but hopefully the users don't need to know that the C# is running elsewhere and the only visible difference would be sync calls from JS and bit of interop perf. This "deputy thread" concept is mostly for this small audience.

@kg
Copy link
Member

kg commented Sep 7, 2023

I think you've done a fantastic job coming up with a solution for this problem, but I think it might be best to not solve this problem. While reading over the design I kept thinking "this has a lot in common with blazor server, maybe we can reuse all its remoting and etc for this scenario".

The way I see it this proposes a sort of hybrid model between existing blazor WASM and existing blazor server:

  1. Everything runs on the client
  2. (new) The client is running a client thread and a server thread (the latter is the well-named deputy worker)
  3. There is a client thread on the client and a server thread on the server

I think there's a lot of value in having this hybrid option - it reduces latency vs blazor server, and reduces the resources spent on the server and reduces bandwidth spent communicating between client and server. That's great.

But do we actually want to have three models? Will users comprehend all the nuances of the 3 models and know which one to pick for their scenario? Can we communicate that clearly to them? Will users feel betrayed if they commit to model 2 and then discover they actually needed model 1, or vice versa, and it's hard to swap after the fact? Adding a third distinct model with its own approach to interop feels like it adds a lot of education and maintenance burden since we can't say 'everything that applies to Blazor Server applies to this' or 'everything that applies to Blazor WASM applies to this'. Users may have selected Blazor WASM specifically for the ability to seamlessly communicate between C# and JS, which goes away in this new model.

I think at the end of the day a lot of code makes assumptions that don't hold for the browser, and the existence of the deputy worker will protect that code from some of its assumptions but the code is still fundamentally broken for this environment. It assumes things that aren't true. It would be ideal if we could make all existing C# concurrency constructs and code work in the browser as-is but I'm not sure if the costs we pay for doing so are justified - the existence of the deputy worker and all the inter-thread coordination will have a measurable cost (hopefully small), startup and shutdown get more complex, debugging gets WAY more complex, etc. This could be especially bad on mobile, where increased resource usage = increased odds of a "tab crash"

For the specific scenario of "some code needs to block while waiting for a lock to be released":

  • We successfully move the stall off of the UI thread onto the deputy thread. That is good, it means the browser thread isn't stalled. But the app isn't actually responsive, you will click buttons or interact with widgets and nothing will happen because the deputy thread is hung waiting for a lock. Is that improvement tangible enough to justify the undertaking of moving to the hybrid model? I think this new failure mode is potentially more confusing for the user. Imagine you press a button, the deputy worker starts crunching data, then you type some text into a textarea and the deputy worker wakes up and immediately obliterates the textarea. A blocked UI thread prevents this, which is good!
  • In general if an app is waiting on locks long enough to perceptibly hang the browser thread, the app's locking model is wrong. The deputy worker becomes a heroic effort to compensate for this root problem that will not be able to fully alleviate it.
  • I'm especially concerned about the debugging scenario. If I hit the pause button in the browser debugger right now for a hung blazor app, nothing happens because browser debuggers are bad... okay, but if I open the debugger before launching the app and then hit pause while it's hung, I have a call stack that shows what's going on, and I can at least tell I have a locking problem. If my app is AOTd the stack may even suggest what code is waiting on a lock, which is fantastic. In the deputy worker model nothing will happen when I hit pause because the unresponsiveness is now due to another thread being stuck. I now need to figure out which thread is actually preventing the browser thread from doing its job. The debugging story for this sort of exists on native but on the web it doesn't. Debugging concurrency issues in the browser is already awful and we would be exposing more people to that. (Maybe we just have to build a proper concurrency debugger for the web.)
  • Ultimately any user-exposed code should not be able to block for a long period of time. If it's waiting it should be waiting with a short timeout, if it even waits at all. It should probably be using queues instead, and the UI should be designed to avoid situations where the user performs a bunch of inputs that are all blocked on some invisible processing in the background (i.e. while the app is Busy the UI is locked out to communicate that clicks won't do anything). We can make this bad code work better via the heroic introduction of the deputy worker but it will still be broken and end users will probably still demand better than the experience it's possible to offer them.
  • This code might be in the BCL. Purging the BCL of locks seems like a tall order (maybe I'm overestimating this) and since the main thread will still be running C#, it's still possible it will take locks, it just won't be directly the user's fault anymore. The deputy worker or any other thread will be able to take a lock and then cause BCL code to block the main thread when it tries to acquire the lock, right? So we've improved the situation but not completely prevented the "UI thread is waiting on a lock" scenario, I think.

You've clearly already thought through the problem scenarios here and all the nuances we have to tackle, like remoting errors across thread boundaries, the thread affinity of JS objects, etc. I'm confident that if we go forward with this design we'll eventually end up with something really good. My concern is ultimately that I don't know how many end users would actually be happy after adopting it, because the code they're trying to run is ill-suited for the web environment. I think our time and resources might be better spent on continued efforts to try and move users away from synchronous blocking models for their web apps, and perhaps improved tooling to ease that transition.

On the other hand, maybe the deputy worker model gets users 90% of the way there, and they can fix the other 10% of cases and ship their app and not have to think about it ever again, and their users will be satisfied even if they're unhappy with how the app misbehaves while the deputy worker is hung. That's a real possibility and maybe it's good enough.

At the end of the day though I'm not sure I can convince myself that Monitor.Enter or Task.Wait should work in the browser unless a very low timeout is specified. We want to surface:

  • it's not guaranteed you will acquire your lock / your work will complete
  • so since it's not guaranteed, you need to handle the case where what you expected to happen hasn't happened
  • this means you need to think about how your application should behave in both scenarios

I'm also mildly concerned that any illusion we manage to maintain in this regard will eventually break. Browser vendors already made the arbitrary decision to ban timers, what's stopping them from deciding to terminate any web worker that runs for too long without yielding to the event loop? or to flip it around: browser vendors got away with this, so why can't we tell people infinite-timeout locks are banned? 😊

In conclusion (this comment has gotten away from me) I think maybe the right decision is instead to gently force developers to take their medicine, and make the medicine taste as good as possible with a set of really good debugging and development tools that build on what we've already got.

docs/design/mono/wasm-threads.md Outdated Show resolved Hide resolved
- alloc/free memory
- transform IL -> IR and update Mono VM shared state
- we will spin lock before Blazor `renderBatch`
- to wait for "pause GC"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we can't pause the GC because the deputy worker is stuck? I guess the browser thread hangs, right?

- we will spin lock before Blazor `renderBatch`
- to wait for "pause GC"
- we will spin lock during GC, if we hit barrier
- TODO: is that short enough ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCs can take hundreds of MS in the real world (I'm not sure why, but our GC under WASM seems to be much slower than it would be on native) so we should assume that users on lower spec devices will see hangs if we have to do this. It might still be the right choice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other alternative is to rewrite renderBatch to streaming bytes.

- has SynchronizationContext installed on it
- So that C# calls could be dispatched to it by runtime
- throw PNSE on attempt to marshal sync C# delegate to UI JavaScript
- can run C# finalizers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finalizers already run on a separate thread, not the main thread, AFAIK. the finalizer thread is owned by the GC, I think?

Copy link
Member Author

@pavelsavara pavelsavara Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't run finalizers in signe-threaded WASM right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in single-threaded wasm finalizers run in mono_runtime_do_background_work which is scheduled as a background job when mono_wasm_gc_finalize_notify is called. In other words after a GC is finished we queue up an idle task to run finalizers.

- So that C# calls could be dispatched to it by runtime
- throw PNSE on attempt to marshal sync C# delegate to UI JavaScript
- can run C# finalizers
- will run GC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we somehow guarantee that GC will never happen on the browser thread? The browser thread will potentially have to stall and wait for GC at the very least, if we ever allow C# to run on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's GC.TryStartNoGCRegion() (no-op on Mono, I think). It's a weird global mode (so if you have background threads that are quickly allocating, your critical region can still run out of memory), but it might be good enough for some scenarios.

## JSWebWorker with JS interop
- is C# thread created and disposed by new API for it
- could block on synchronization primitives
- there is JSSynchronizationContext installed on it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI since you mentioned performance, be aware that (at present) having any kind of custom synchronization context installed causes some of the Task/Await machinery to go through a slow-path. see #69409 (comment) (though I hope this won't actually matter)

- the `SetResult` need to be marshaled on thread of the Promise.
- The proxy of the Promise knows which `SynchronizationContext` to dispatch to.
- on UI thread it's the UI thread's SynchronizationContext, not deputy's
- TODO: could same task be marshaled to multiple JS workers ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should assume it can and will happen

@pavelsavara
Copy link
Member Author

pavelsavara commented Sep 7, 2023

I think there's a lot of value in having this hybrid option - it reduces latency vs blazor server, and reduces the resources spent on the server and reduces bandwidth spent communicating between client and server. That's great.

This is interesting angle.

Could we do "blazor server" in the MT (and emscripten started on the worker as in alternative 10), while the UI would just run the same light weight "blazor server"" signalR like protocol via JS only? It would make the user mental model easier. There would be no non-blazor JS interop and no IJSInProcessRuntime.Invoke and IJSUnmarshalledRuntime. User JS code only independent of the WASM/C#.

This will behave similar to "blazor server" but it would work offline and low latency.

That could solve blazor, but not non-blazor MT is that acceptable outcome ?

I will do separate proposal for that, so the we don't confuse discussion here.
See #91731

@pavelsavara
Copy link
Member Author

pavelsavara commented Sep 7, 2023

debugging gets WAY more complex

Yes, chrome dev tools experience may be different, because it's will be async with the C# part.
VS debugger would connect to mono as usual.

For the specific scenario of "some code needs to block while waiting for a lock to be released":
* Ultimately any user-exposed code should not be able to block for a long period of time.
* This code might be in the BCL. Purging the BCL of locks seems like a tall order
At the end of the day though I'm not sure I can convince myself that Monitor.Enter or Task.Wait should work in the browser unless a very low timeout is specified.

I disagree, I think C# desktop ecosystem is full of locks and MT.
And one of the reasons people want MT is that this blocking code would work.
If we say, "don't block", they could run the code on single-threaded runtime already.
But that kills 90% of MT value proposition.

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Sep 7, 2023

Imagine you press a button, the deputy worker starts crunching data, then you type some text into a textarea and the deputy worker wakes up and immediately obliterates the textarea. A blocked UI thread prevents this, which is good!

You're right that this is a difficult problem, but it's one we've already solved in an even more challenging environment (Blazor Server doing these calls over the internet). We already track old, incoming, and new values separately so we can pick the right final state based on what actually changed, and we have an event dispatch system that takes account of the fact that the UI shape can change in between event handlers being exposed and event calls arriving at the server.

There are some edge cases where developers run into issues with overlapping state changes, but it's rare in practice because we've learned how common UX patterns interact with network latency and what developers usually want to happen.

Latency across threads in the same browser will be less prone to these difficult cases than arbitrary latency across the internet.

@lambdageek
Copy link
Member

Could we do "blazor server" in the MT (and emscripten started on the worker as in alternative 10), while the UI would just run the same light weight "blazor server"" signalR like protocol via JS only? It would make the user mental model easier.

FWIW this is basically the design that VS Code wasm-wasi settled on, too, for different reasons (plugin isolation).

@elringus
Copy link

elringus commented Sep 20, 2023

Excuse me if I'll be off the point here, but just wanted to share my experience in a similar (as I see it) matter.

I'm building a web (PWA) app with C# "backend" compiled to WASM and react frontend communicating in sync/blocking fashion. I've started with .NET 6 where I had to modify the .NET runtime a bit to make it work with the bundlers. I've also added some quality of life stuff on top of the interop, like auto-generating bindings based on C# interfaces, TS types generation, events, etc.

It all worked great in the end, DX was basically like the C# part was another JS library consumed on the frontend. Testing and debuging were straightforward.

At some point I had an idea of moving the C# "backend" to a web worker. I thought this would ultimately dedicate browser's UI thread to the UI-only stuff, like rendering and animation and improve responsiveness (even though there were no real issues with that at the moment).

To minimize breaking changes in the APIs (I've already had a chunky codebase at that point), I've used comlink (which uses JS proxies under the hood) to re-route the interop calls to the webworker. Basically, all the existing APIs were intact, except that they all become async, and blocking calls from JS to C# were no longer possible.

In the end, however, I've rolled all that back to blocking. Main reason was debug experience. And it's not about connecting a debugger or anything, but just understanding the flow control. Complex UI frameworks (like React) have lots going on under the hood and when coupled with an async behaviour on the other "side" it just becomes a real pain to work with. The sweet DX of consuming C# API as a JS library goes away and you have to think about it as a remote server.

Another point is performance/responsiveness. While the UI was not blocking anymore (though it was a hardly noticable improvement in itself), total time spend on interop became noticeably larger, especially in tight back-and-forth loops.

Imho, running the entire C# module on webwoker is not worth it, at least in the case of apps with common "point-click" interfaces (it may be different for something realtime, like games, though in this case input latency will suffer).

However, in case of computation-heavy tasks, it sure make sense to off-load them on worker threads. I hoped that maybe it could be possible to add C# API to spawn worker threads just for that. No interop or any sort of communication with the JS side, just pure C# code running on the webworker. Is this at all possible w/o moving the whole runtime to worker as well? One use case for something like this I had in mind is running a language server; ie, our main C# "backend" is a visual editor, which has an optional text editing mode. The text editor communicate with a language server written in C#. Instead of building standalone .NET WASM module just for the server, it would be nice to spawn it from the main C# "backend" to a worker thread.

@pavelsavara
Copy link
Member Author

Excuse me if I'll be off the point here, but just wanted to share my experience in a similar (as I see it) matter.

Thanks for chiming in, much appreciated!

I move the response to side-car design, because your feedback is very relevant there.

@pavelsavara pavelsavara changed the title [browser] threads design [browser] WASM deputy worker - multi-threading proposal Sep 22, 2023
@pavelsavara
Copy link
Member Author

pavelsavara commented Sep 26, 2023

There could be variation on this design, when

  • emscripten still starts on the UI thread
  • dotnet starts on deputy-thread
  • JSImport/JSExport still talks to UI/DOM heap, but via JS proxies, not C# dispatch

Benefits

  • UI thread could call simple synchronous C functions without dispatch
    • like memory access and allocations
    • probably not any Mono functions, as that would require Mono registered thread and GC barrier
  • the UI thread would be mostly doing nothing, and so it could serve others (managing pthread pool) without being blocked
  • deputy thread could make blocking calls to UI, without blocking pthread pool and other emscripten_check_mailbox jobs

Especially compared to sidecar option D)

Downsides

  • more JS code, complex code (as compared to C# dispatch)

@marek-safar
Copy link
Contributor

marek-safar commented Sep 27, 2023

Should this rather go to dotnet/designs repo?

@pavelsavara
Copy link
Member Author

Closing in favor of dotnet/designs#301

@ghost ghost locked as resolved and limited conversation to collaborators Oct 28, 2023
@pavelsavara pavelsavara deleted the browser_threads_design branch September 2, 2024 15:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-VM-threading-mono os-browser Browser variant of arch-wasm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants