Introduce a `BatchingMacrotaskExecutor` #3225

armanbilge · 2022-11-05T06:09:33Z

This one will definitely require some kind of benchmarking :)

This is inspired by the timer+I/O-integrated runtime work, where we only check for outstanding timers and I/O every 64 iterations of the runloop. Meanwhile the MacrotaskExecutor is based on setImmediate, which gives priority to I/O events.

Schedules the "immediate" execution of the callback after I/O events' callbacks.

https://nodejs.org/api/timers.html#setimmediatecallback-args

Cats Effect and downstream libraries are very fiber-happy so it seems a bit outrageous that every started fiber must wait for an entire event-loop cycle before it runs. This is compounded by the fact that in browsers we generally can't even submit directly to the macrotask queue and rely on hacks such as postMessage.

Indeed, this is the usual performance vs fairness tradeoff. But I am still unconvinced about the importance of fairness in JS.

In a JS lambda, you are limited to processing one request at a time. So it's not really obvious to me what there is to be unfair to, and how that might be observed.
In the browser, lack of fairness is most easily observed as delayed rendering. But if you are doing so much work in response to some event (e.g. user input or websocket message) that it affects rendering you are probably in trouble anyway. Increasing fairness will at best enable progressive but glitchy rendering and at worst make no difference.

Ironically the MacrotaskExecutor is about as fair as it gets 😅

armanbilge · 2022-11-05T06:32:10Z

core/js/src/main/scala/cats/effect/unsafe/BatchingMacrotaskExecutor.scala

+
+  private[this] var needsReschedule: Boolean = true
+
+  private[this] val executeQueue: ArrayDeque[Runnable] = new ArrayDeque


Performance will be sub-optimal until we get a Scala.js release with this.

Fix #4734: Implement ArrayDeque with a ring-buffer scala-js/scala-js#4736

armanbilge · 2022-11-05T06:32:34Z

core/js/src/main/scala/cats/effect/unsafe/IORuntimeCompanionPlatform.scala

 import scala.concurrent.ExecutionContext
 import scala.scalajs.LinkingInfo

 private[unsafe] abstract class IORuntimeCompanionPlatform { this: IORuntime.type =>

-  def defaultComputeExecutionContext: ExecutionContext =
+  def defaultComputeExecutionContext: ExecutionContext = {
+    val ec = new BatchingMacrotaskExecutor(64)


We should make this configurable.

djspiewak · 2022-11-11T18:29:12Z

I'm skeptical! I'll take a closer look soon, but in general, I'd rather tune the fairness/throughput coefficient within IOFiber itself, rather than at the executor level.

armanbilge · 2022-11-11T18:33:31Z

I'm skeptical! I'll take a closer look soon, but in general, I'd rather tune the fairness/throughput coefficient within IOFiber itself, rather than at the executor level.

I don't see how that's possible here.

djspiewak · 2022-11-27T22:50:59Z

Okay I have an idea: what if we simply let IOFiber handle this? We're basically saying that there's a certain class of operations (e.g. start) which we would like to execute with microtask semantics, up to some bound, and we want to do this just on JS platforms. In particular, we can either make scheduleFiber platform-specific (which could allow us to remove WSTP from the JS binary), or we could make a special variant of it which is itself platform-specific. JVM and Native behavior should be unaffected.

armanbilge · 2022-11-28T00:40:49Z

We're basically saying that there's a certain class of operations (e.g. start) which we would like to execute with microtask semantics, up to some bound

Yes, exactly. but how do we implement this bound? It seems like it requires some "global" state (at least from the perspective of an individual fiber). Which brings us to the ExecutionContext.

djspiewak · 2022-11-28T01:27:54Z

Oh yes, we do implement a wrapping EC like this one, but we don't batch all executes. Instead, we have an additional method which adds to the batch and is called by the Start interpreter.

armanbilge · 2022-11-28T01:33:14Z

Ahhh, I gotcha now. Makes sense, thanks!

armanbilge · 2022-11-28T06:29:20Z

core/shared/src/main/scala/cats/effect/IOFiber.scala

  private[this] def scheduleFiber(ec: ExecutionContext, fiber: IOFiber[_]): Unit = {
-    if (ec.isInstanceOf[WorkStealingThreadPool]) {
+    if (Platform.isJvm && ec.isInstanceOf[WorkStealingThreadPool]) {
      val wstp = ec.asInstanceOf[WorkStealingThreadPool]
      wstp.execute(fiber)
+    } else if (Platform.isJs && ec.isInstanceOf[BatchingMacrotaskExecutor]) {
+      val bmte = ec.asInstanceOf[BatchingMacrotaskExecutor]
+      bmte.schedule(fiber)
    } else {
      scheduleOnForeignEC(ec, fiber)
    }


So I know this is not really what you suggested but on the other hand this lets us avoid unintentionally de-optimizing a private[this] method. Here's what this decompiles to on the JVM:

private void scheduleFiber(ExecutionContext ec, IOFiber<?> fiber) { if (true && ec instanceof WorkStealingThreadPool) { WorkStealingThreadPool wstp = (WorkStealingThreadPool)ec; wstp.execute(fiber); } else if (false && ec instanceof BatchingMacrotaskExecutor) { BatchingMacrotaskExecutor bmte = (BatchingMacrotaskExecutor)ec; bmte.schedule(fiber); } else { this.scheduleOnForeignEC(ec, fiber); } }

The downside is now BatchingMacrotaskExecutor is living in JVM/Native binaries ...

This is a pretty big deoptimization since it makes the branching much more complex. It's possible that the constants together with JIT inlining cause the ultimately-emitted assembly to elide the impossible branch, but I would want to check that before we really trust it. We can avoid this by relying on IOFiberPlatform instead.

Ok, I rearranged the platform checks and now the emitted bytecode is completely identical to what it was before. I understand objections based on style / increased binary surface, but I do not see how this is not the maximally optimized implementation.

private void scheduleFiber(ExecutionContext ec, IOFiber<?> fiber) { if (ec instanceof WorkStealingThreadPool) { WorkStealingThreadPool wstp = (WorkStealingThreadPool)ec; wstp.execute(fiber); } else { this.scheduleOnForeignEC(ec, fiber); } }

armanbilge · 2022-11-28T06:48:28Z

So I'm giving this a try. Still need to take another pass adding some internal scaladocs and maybe bikeshedding the names.

I think every operation that scheduleFiber() is used for makes sense for batching via the microtask (promises) executor, namely: start, racePair, and resuming an async. Meanwhile, (auto-)ceding is implemented with rescheduleFiber() and this should definitely go through the macrotask executor, so that it actually cedes to the event loop :)

So I think this should work well for the typical scenario where an incoming UI/IO event is picked up by either an async or Dispatcher, and starts/resumes a handful of short-running fibers to process the event.

core/js/src/main/scala/cats/effect/unsafe/BatchingMacrotaskExecutor.scala

core/js/src/main/scala/cats/effect/Platform.scala

core/js/src/main/scala/cats/effect/unsafe/BatchingMacrotaskExecutor.scala

djspiewak · 2022-11-28T15:37:22Z

core/jvm-native/src/main/scala/cats/effect/unsafe/BatchingMacrotaskExecutor.scala

+
+import scala.concurrent.ExecutionContext
+
+private[effect] sealed abstract class BatchingMacrotaskExecutor private ()


Another disadvantage of the Platform encoding used above is this type now leaks across to JVM and Native.

djspiewak · 2022-11-28T15:39:13Z

core/shared/src/main/scala/cats/effect/IOFiber.scala

  private[this] def scheduleFiber(ec: ExecutionContext, fiber: IOFiber[_]): Unit = {
-    if (ec.isInstanceOf[WorkStealingThreadPool]) {
+    if (Platform.isJvm && ec.isInstanceOf[WorkStealingThreadPool]) {
      val wstp = ec.asInstanceOf[WorkStealingThreadPool]
      wstp.execute(fiber)
+    } else if (Platform.isJs && ec.isInstanceOf[BatchingMacrotaskExecutor]) {
+      val bmte = ec.asInstanceOf[BatchingMacrotaskExecutor]
+      bmte.schedule(fiber)
    } else {
      scheduleOnForeignEC(ec, fiber)
    }


This is a pretty big deoptimization since it makes the branching much more complex. It's possible that the constants together with JIT inlining cause the ultimately-emitted assembly to elide the impossible branch, but I would want to check that before we really trust it. We can avoid this by relying on IOFiberPlatform instead.

core/shared/src/main/scala/cats/effect/IOFiber.scala

armanbilge · 2022-12-04T10:31:45Z

Ok, I think this is ready for another look, thanks for all the pointers. No rush, since it has to wait until 3.5.x because it needs a Scala.js upgrade. (Unless there's something wrong with my approach, and we can do this without the ArrayDeque. Update: I ended up removing the ArrayDeque because it has issues ...)

The only outstanding issue is whether to do the platforming with compile-time conditionals or platform traits.

armanbilge · 2022-12-10T20:07:31Z

I benchmarked a warmed-up Ember.js "hello world" server with these changes.

Under concurrent load, performance is roughly similar.

When considering a single connection, the improvement is prominent: 60% higher RPS.

This makes sense: in the single connection case, there are never any other I/O events besides the one being currently handled. So ceding to the event loop during processing of the I/O event is pointless.

Meanwhile, in the concurrent connection case, nothing suggests that fairness has been compromised. This is consistent with the pattern where reacting to an I/O event starts a small number of short-lived fibers.

50 concurrent connections

3.4.2

Summary:
  Total:	30.0618 secs
  Slowest:	8.6726 secs
  Fastest:	0.0125 secs
  Average:	0.0966 secs
  Requests/sec:	517.2019

3.5-88faeed

Summary:
  Total:	30.0469 secs
  Slowest:	6.6952 secs
  Fastest:	0.0022 secs
  Average:	0.0884 secs
  Requests/sec:	564.7503

1 connection

3.4.2

Summary:
  Total:	30.0012 secs
  Slowest:	0.0561 secs
  Fastest:	0.0022 secs
  Average:	0.0031 secs
  Requests/sec:	322.8537

3.5-88faeed

Summary:
  Total:	30.0012 secs
  Slowest:	0.0244 secs
  Fastest:	0.0013 secs
  Average:	0.0020 secs
  Requests/sec:	499.9141

…ing-macrotask-executor

armanbilge · 2023-01-22T01:18:01Z

core/js/src/main/scala/cats/effect/unsafe/JSArrayQueue.scala

+/**
+ * A JS-Array backed circular buffer FIFO queue. It is careful to grow the buffer only using
+ * `push` to avoid creating "holes" on V8 (this is a known shortcoming of the Scala.js
+ * `j.u.ArrayDeque` implementation).
+ */
+private final class JSArrayQueue[A] {


Fix #4734: Implement ArrayDeque with a ring-buffer scala-js/scala-js#4736 (comment)

armanbilge · 2023-01-22T01:47:50Z

Besides the happy-path performance improvements, it's worth pointing out that this new EC helps mitigate other performance issues:

We no longer have to wrap every fiber to support fiber dumps, since the EC natively supports this.
If you somehow are in an environment where macrotasks are subject to the 4ms+ clamping, then there is no longer a clamping overhead for each fiber. Indeed, if each UI or I/O event spawns a small batch of short-lived fibers, clamping would not even be triggered.

djspiewak

Still very uncomfortable with the Platform trait but I'll try to hit that in a follow-up.

Introduce a BatchingMacrotaskExecutor

1f7aa38

armanbilge commented Nov 5, 2022

View reviewed changes

Use batching only for IOFiber#schedule

743caed

armanbilge commented Nov 28, 2022

View reviewed changes

core/js/src/main/scala/cats/effect/unsafe/BatchingMacrotaskExecutor.scala Outdated Show resolved Hide resolved

armanbilge added 2 commits November 28, 2022 07:07

Fix BatchingMacrotaskExecutor implementation

9f0c418

Wip test spec

6f777fe

djspiewak requested changes Nov 28, 2022

View reviewed changes

armanbilge added 3 commits November 28, 2022 19:51

Rearrange platform checks

bfd876d

Make batching ec fiber-aware

c82ed91

Restore deque-based implementation

9189ee1

armanbilge changed the base branch from series/3.4.x to series/3.x December 4, 2022 07:09

armanbilge added 5 commits December 4, 2022 07:15

Fix bugs

e67900c

Add comments to spec

baa1f6c

Fix "limit batch sizes" test

5cdc0d1

Organize imports

1aa3065

Merge branch 'series/3.x' into feature/batching-macrotask-executor

0d94bf5

armanbilge marked this pull request as ready for review December 4, 2022 08:15

armanbilge added this to the v3.5.0 milestone Dec 4, 2022

armanbilge added 4 commits December 4, 2022 08:39

Add MiMa filter

a89d6e1

Comments and scaladocs

71059ac

Another mima filter

c39ce00

More efficient fiber monitoring

9af5469

Update comment

88faeed

armanbilge mentioned this pull request Jan 4, 2023

Remove BatchingMacrotaskExecutor armanbilge/calico#127

Merged

armanbilge mentioned this pull request Jan 20, 2023

Revisit DomHotswap and .cedeBackground armanbilge/calico#150

Closed

armanbilge added 11 commits January 21, 2023 06:00

Make executeBatchTask an object

b414390

Make link-time condition elidable

f770758

Expose config for BatchingMacrotaskExecutor

a7900ba

Merge remote-tracking branch 'upstream/series/3.x' into feature/batch…

24cb9d8

…ing-macrotask-executor

First try at JSArrayQueue

ca07e69

Add JSArrayQueueSpec

db9a8cd

isEmpty() is side-effectful

b09990e

Bug fix

7ca1b9c

Use JSArrayQueue in batching EC

f11e28f

Directly use queueMicrotask

4cf6972

Rollback Scala.js bump

51ca39e

armanbilge commented Jan 22, 2023

View reviewed changes

armanbilge mentioned this pull request Jan 24, 2023

Save some lines on forSignal armanbilge/calico#161

Closed

Cache executeBatchTaskJSFunction

12afda7

djspiewak approved these changes Jan 28, 2023

View reviewed changes

djspiewak merged commit e4f2b71 into typelevel:series/3.x Jan 28, 2023


		private[this] var needsReschedule: Boolean = true

		private[this] val executeQueue: ArrayDeque[Runnable] = new ArrayDeque


		import scala.concurrent.ExecutionContext

		private[effect] sealed abstract class BatchingMacrotaskExecutor private ()

Introduce a BatchingMacrotaskExecutor #3225

Introduce a BatchingMacrotaskExecutor #3225

Uh oh!

Conversation

armanbilge commented Nov 5, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

djspiewak commented Nov 11, 2022

Uh oh!

armanbilge commented Nov 11, 2022

Uh oh!

djspiewak commented Nov 27, 2022

Uh oh!

armanbilge commented Nov 28, 2022

Uh oh!

djspiewak commented Nov 28, 2022

Uh oh!

armanbilge commented Nov 28, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

armanbilge commented Nov 28, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

armanbilge commented Dec 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

armanbilge commented Dec 10, 2022

50 concurrent connections

3.4.2

3.5-88faeed

1 connection

3.4.2

3.5-88faeed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

armanbilge commented Jan 22, 2023

Uh oh!

djspiewak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Introduce a `BatchingMacrotaskExecutor` #3225

Introduce a `BatchingMacrotaskExecutor` #3225

armanbilge commented Dec 4, 2022 •

edited

Loading