You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Run only one sidekiq thread for orchestrator
The orchestrator jobs only take messages from redis and put them into the
orchestrator's mailbox. By reducing the number of threads to 1, we rule out
possibility of shuffling order of incoming messages.
* Make orchestrator discard items submitted by another orchestrator
When the orchestrator boots up, it puts a DrainMarker job onto the default
queue, puts itself into recovery mode and starts processing messages.
When in recovery mode, it discards any incoming WorkerDone jobs. This means
anything that was being done while there was no orchestrator gets only processed
by the workers. When in recovery, the orchestrator should still be able to
acquire completely new jobs and being in recovery should only impact already
running jobs.
When a worker receives the DrainMarker job, it replies to orchestrator with a
StartupComplete job. Upon receiving the StartupComplete job, the orchestrator
performs world invalidation and tries to resume all execution plans which were
being managed by the previous orchestrator. When the world invalidation is done,
the orchestrator goes out of recovery.
This "round trip" is done to ensure we get into as consistent state as possible.
After the DrainMarker-StartupComplete exchange, there will be at most n
inconsistent execution plans, where n is the number of worker threads. This is
just a heuristic and not a complete solution which starts to break when more
queues are added.
Upon receiving a WorkerDone from a worker with world_id different from the
current orchestrator's one, the orchestrator tries to resume the execution plan,
unless it already holds its execution lock.
0 commit comments