-
Notifications
You must be signed in to change notification settings - Fork 557
Introduce a BatchingMacrotaskExecutor
#3225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a BatchingMacrotaskExecutor
#3225
Conversation
|
||
private[this] var needsReschedule: Boolean = true | ||
|
||
private[this] val executeQueue: ArrayDeque[Runnable] = new ArrayDeque |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance will be sub-optimal until we get a Scala.js release with this.
import scala.concurrent.ExecutionContext | ||
import scala.scalajs.LinkingInfo | ||
|
||
private[unsafe] abstract class IORuntimeCompanionPlatform { this: IORuntime.type => | ||
|
||
def defaultComputeExecutionContext: ExecutionContext = | ||
def defaultComputeExecutionContext: ExecutionContext = { | ||
val ec = new BatchingMacrotaskExecutor(64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make this configurable.
I'm skeptical! I'll take a closer look soon, but in general, I'd rather tune the fairness/throughput coefficient within |
I don't see how that's possible here. |
Okay I have an idea: what if we simply let |
Yes, exactly. but how do we implement this bound? It seems like it requires some "global" state (at least from the perspective of an individual fiber). Which brings us to the |
Oh yes, we do implement a wrapping EC like this one, but we don't batch all |
Ahhh, I gotcha now. Makes sense, thanks! |
private[this] def scheduleFiber(ec: ExecutionContext, fiber: IOFiber[_]): Unit = { | ||
if (ec.isInstanceOf[WorkStealingThreadPool]) { | ||
if (Platform.isJvm && ec.isInstanceOf[WorkStealingThreadPool]) { | ||
val wstp = ec.asInstanceOf[WorkStealingThreadPool] | ||
wstp.execute(fiber) | ||
} else if (Platform.isJs && ec.isInstanceOf[BatchingMacrotaskExecutor]) { | ||
val bmte = ec.asInstanceOf[BatchingMacrotaskExecutor] | ||
bmte.schedule(fiber) | ||
} else { | ||
scheduleOnForeignEC(ec, fiber) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I know this is not really what you suggested but on the other hand this lets us avoid unintentionally de-optimizing a private[this]
method. Here's what this decompiles to on the JVM:
private void scheduleFiber(ExecutionContext ec, IOFiber<?> fiber) {
if (true && ec instanceof WorkStealingThreadPool) {
WorkStealingThreadPool wstp = (WorkStealingThreadPool)ec;
wstp.execute(fiber);
} else if (false && ec instanceof BatchingMacrotaskExecutor) {
BatchingMacrotaskExecutor bmte = (BatchingMacrotaskExecutor)ec;
bmte.schedule(fiber);
} else {
this.scheduleOnForeignEC(ec, fiber);
}
}
The downside is now BatchingMacrotaskExecutor
is living in JVM/Native binaries ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty big deoptimization since it makes the branching much more complex. It's possible that the constants together with JIT inlining cause the ultimately-emitted assembly to elide the impossible branch, but I would want to check that before we really trust it. We can avoid this by relying on IOFiberPlatform
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I rearranged the platform checks and now the emitted bytecode is completely identical to what it was before. I understand objections based on style / increased binary surface, but I do not see how this is not the maximally optimized implementation.
private void scheduleFiber(ExecutionContext ec, IOFiber<?> fiber) {
if (ec instanceof WorkStealingThreadPool) {
WorkStealingThreadPool wstp = (WorkStealingThreadPool)ec;
wstp.execute(fiber);
} else {
this.scheduleOnForeignEC(ec, fiber);
}
}
So I'm giving this a try. Still need to take another pass adding some internal scaladocs and maybe bikeshedding the names. I think every operation that So I think this should work well for the typical scenario where an incoming UI/IO event is picked up by either an |
core/js/src/main/scala/cats/effect/unsafe/BatchingMacrotaskExecutor.scala
Outdated
Show resolved
Hide resolved
core/js/src/main/scala/cats/effect/unsafe/BatchingMacrotaskExecutor.scala
Outdated
Show resolved
Hide resolved
|
||
import scala.concurrent.ExecutionContext | ||
|
||
private[effect] sealed abstract class BatchingMacrotaskExecutor private () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another disadvantage of the Platform
encoding used above is this type now leaks across to JVM and Native.
private[this] def scheduleFiber(ec: ExecutionContext, fiber: IOFiber[_]): Unit = { | ||
if (ec.isInstanceOf[WorkStealingThreadPool]) { | ||
if (Platform.isJvm && ec.isInstanceOf[WorkStealingThreadPool]) { | ||
val wstp = ec.asInstanceOf[WorkStealingThreadPool] | ||
wstp.execute(fiber) | ||
} else if (Platform.isJs && ec.isInstanceOf[BatchingMacrotaskExecutor]) { | ||
val bmte = ec.asInstanceOf[BatchingMacrotaskExecutor] | ||
bmte.schedule(fiber) | ||
} else { | ||
scheduleOnForeignEC(ec, fiber) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty big deoptimization since it makes the branching much more complex. It's possible that the constants together with JIT inlining cause the ultimately-emitted assembly to elide the impossible branch, but I would want to check that before we really trust it. We can avoid this by relying on IOFiberPlatform
instead.
Ok, I think this is ready for another look, thanks for all the pointers. No rush, since it has to wait until 3.5.x because it needs a Scala.js upgrade. (Unless there's something wrong with my approach, and we can do this without the The only outstanding issue is whether to do the platforming with compile-time conditionals or platform traits. |
I benchmarked a warmed-up Ember.js "hello world" server with these changes. Under concurrent load, performance is roughly similar. When considering a single connection, the improvement is prominent: 60% higher RPS. This makes sense: in the single connection case, there are never any other I/O events besides the one being currently handled. So ceding to the event loop during processing of the I/O event is pointless. Meanwhile, in the concurrent connection case, nothing suggests that fairness has been compromised. This is consistent with the pattern where reacting to an I/O event starts a small number of short-lived fibers. 50 concurrent connections3.4.2
3.5-88faeed
1 connection3.4.2
3.5-88faeed
|
…ing-macrotask-executor
/** | ||
* A JS-Array backed circular buffer FIFO queue. It is careful to grow the buffer only using | ||
* `push` to avoid creating "holes" on V8 (this is a known shortcoming of the Scala.js | ||
* `j.u.ArrayDeque` implementation). | ||
*/ | ||
private final class JSArrayQueue[A] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the happy-path performance improvements, it's worth pointing out that this new EC helps mitigate other performance issues:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still very uncomfortable with the Platform
trait but I'll try to hit that in a follow-up.
This one will definitely require some kind of benchmarking :)
This is inspired by the timer+I/O-integrated runtime work, where we only check for outstanding timers and I/O every 64 iterations of the runloop. Meanwhile the
MacrotaskExecutor
is based onsetImmediate
, which gives priority to I/O events.https://nodejs.org/api/timers.html#setimmediatecallback-args
Cats Effect and downstream libraries are very fiber-happy so it seems a bit outrageous that every started fiber must wait for an entire event-loop cycle before it runs. This is compounded by the fact that in browsers we generally can't even submit directly to the macrotask queue and rely on hacks such as
postMessage
.Indeed, this is the usual performance vs fairness tradeoff. But I am still unconvinced about the importance of fairness in JS.
In a JS lambda, you are limited to processing one request at a time. So it's not really obvious to me what there is to be unfair to, and how that might be observed.
In the browser, lack of fairness is most easily observed as delayed rendering. But if you are doing so much work in response to some event (e.g. user input or websocket message) that it affects rendering you are probably in trouble anyway. Increasing fairness will at best enable progressive but glitchy rendering and at worst make no difference.
Ironically the
MacrotaskExecutor
is about as fair as it gets 😅