[results forest] #8251

AlexHentschel · 2025-12-11T06:29:40Z

🚧 PoC 🚧

PoC pipeline implementation + detailed documentation: concurrency safe, idempotent and tolerates information to arrive out of order

Benefits are explained here in the code's own documentation.

This is work evolving from #7523

Comments for reviewers

I have introduced Pipeline2 as a thought-experiment. At the moment, it is a separate struct and lives in parallel to Pipeline. Pipeline2 does not yet implement the Pipeline interface.
I have used a more expressive state space for the pipeline (I think a bit closer to Peter's original design ... narrowing it down 😅). In a nutshell, I have created the State2 (roughly following the existing State).
- Formally, the pipeline is described via a state machine.
- I use the uint32 "go enum" to enumerate the states of the state machine.
- The State2 struct also specifies the state machine's transition function, initial state, and terminal state via exported methods.
An instance of the state machine would be the State2Tracker struct.
under the hood, State2Tracker works with an atomic State2 (uint32 ). State2Tracker provides a slightly expanded set if state machine methods:
- multi-step evolution inside a single atomic update
- atomic compare and swap operation with a more expressive return signature. This helps the caller with optimistic "try an apply, analyze failure if occurred and error or no-op". Under normal operations on the happy path, these algorithms tend to be very fast and efficient.
Note that State2Tracker implements the entire state machine without any locks. This is easily possible if the state tracker does not need to deal with the state machine's input.

We use the atomicity of the State2Tracker as a gateway. Only one concurrent thread can successfully perform the state transition. Conceptually, we achieve collaborative multithreading in the rare cases where routines concurrently deliver partially redundant or outdated information.
I haven't touched pipeline.Core. Though, my gut feeling is that moderate revisions to Core would be useful to to better fit it to the updated Pipeline2. I intend to crate Core2 as copied version of Core and evolve it from there.

Resource balancing

There are three worker pools injected into a Pipeline2 instances during instantiation. The intention is that all pipeline instances managed by the ResultsForest would share the same worker pools. Thereby we could say: "there can be at most 20 downloads going on in parallel" but "max 6 indexing operations in parallel", because each task "category" (downloading, indexing, persisting) is processed by its own worker pool.

In general it is recommended to utilize queues for work in pipelined systems - the nice thing about worker pool is that the already have the queues inside. My gut feeling is that this design will make the overall component a lot more resilient during unfavourable operational conditions including high-load catchup scenarios. In queuing systems incl. pipelines, "bang-bang" behaviours are usually undesired and inefficient - unfortunately they surprisingly often emerge as self-sustaining Nash-equilibria In highly concurrent environments with workloads requiring strongly different resource profiles (networking, CPU & memory, database).

What I haven't found a satisfactory solution for are irrecoverable errors. These could happen during task execution inside a worker. I am speculating that we can maybe use a SignalerContext. This would also nicely provide us with very standardized way for cancelling the context of some task that is already waiting or processed by a worker. ... just thoughts.

…line

… state

…al worker thread

…e, idempotent and tolerates information to arrive out of order

…ntext, irrecoverable errors and locking

AlexHentschel added 10 commits December 1, 2025 19:20

work in progress

94a1b28

First draft of event-style pipeline including comprehensive tests

6e7d027

reworking structure of Pipeline

1946234

Merge branch 'peter/results-forestv2' into alex/results-forestv2_pipe…

58916a6

…line

extended capabilities of State2 for atomic tracking of the pipeline…

023c2ec

… state

atomic results pipeline without internal worker thread

5b3075b

adding additional state State2Persisting

5d9ecec

rough implementation sketch of atomic results pipeline without intern…

15ca0b2

…al worker thread

PoC pipeline implementation + detailed documentation: concurrency saf…

d1ad669

…e, idempotent and tolerates information to arrive out of order

minor cleanup

0d31c44

AlexHentschel requested a review from peterargue December 11, 2025 06:31

during code walkthrough with Peter: added minor comments regarding co…

b824b92

…ntext, irrecoverable errors and locking

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[results forest] #8251

[results forest] #8251

Uh oh!

AlexHentschel commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[results forest] #8251

Are you sure you want to change the base?

[results forest] #8251

Uh oh!

Conversation

AlexHentschel commented Dec 11, 2025

🚧 PoC 🚧

Comments for reviewers

Resource balancing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants