Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚧 PoC 🚧
PoC pipeline implementation + detailed documentation: concurrency safe, idempotent and tolerates information to arrive out of order
Benefits are explained here in the code's own documentation.
This is work evolving from #7523
Comments for reviewers
I have introduced
Pipeline2as a thought-experiment. At the moment, it is a separate struct and lives in parallel toPipeline.Pipeline2does not yet implement thePipelineinterface.I have used a more expressive state space for the pipeline (I think a bit closer to Peter's original design ... narrowing it down 😅). In a nutshell, I have created the
State2(roughly following the existingState).uint32"go enum" to enumerate the states of the state machine.State2struct also specifies the state machine's transition function, initial state, and terminal state via exported methods.An instance of the state machine would be the
State2Trackerstruct.under the hood,
State2Trackerworks with an atomicState2(uint32).State2Trackerprovides a slightly expanded set if state machine methods:Note that
State2Trackerimplements the entire state machine without any locks. This is easily possible if the state tracker does not need to deal with the state machine's input.We use the atomicity of the
State2Trackeras a gateway. Only one concurrent thread can successfully perform the state transition. Conceptually, we achieve collaborative multithreading in the rare cases where routines concurrently deliver partially redundant or outdated information.I haven't touched
pipeline.Core. Though, my gut feeling is that moderate revisions toCorewould be useful to to better fit it to the updatedPipeline2. I intend to crateCore2as copied version ofCoreand evolve it from there.Resource balancing
There are three worker pools injected into a
Pipeline2instances during instantiation. The intention is that all pipeline instances managed by the ResultsForest would share the same worker pools. Thereby we could say: "there can be at most 20 downloads going on in parallel" but "max 6 indexing operations in parallel", because each task "category" (downloading, indexing, persisting) is processed by its own worker pool.In general it is recommended to utilize queues for work in pipelined systems - the nice thing about worker pool is that the already have the queues inside. My gut feeling is that this design will make the overall component a lot more resilient during unfavourable operational conditions including high-load catchup scenarios. In queuing systems incl. pipelines, "bang-bang" behaviours are usually undesired and inefficient - unfortunately they surprisingly often emerge as self-sustaining Nash-equilibria In highly concurrent environments with workloads requiring strongly different resource profiles (networking, CPU & memory, database).
What I haven't found a satisfactory solution for are irrecoverable errors. These could happen during task execution inside a worker. I am speculating that we can maybe use a
SignalerContext. This would also nicely provide us with very standardized way for cancelling the context of some task that is already waiting or processed by a worker. ... just thoughts.