Event Subsystem Architecture Review

This review of Events / notifications and the respective delayed jobs was conducted in July 2017 by @hennevogel, @mdeniz and @evanrolfe.

General Notes

The relationship between events and subscriptions is a complex service class and the logic only works one way. You can only find subscriptions for an event, not the other way round)
Event's data is duplicated for Notifications and ProjecLogEntry instances as the payload

JOBS

General Notes

None of these jobs can track failures.
Its assumed that every job will succeed.
ActiveJob and DelayedJob use different default queues
Jobs shouldn't expose methods beside perform

Requirements for the new delayed job system:

Jobs of the same type do not run concurrently
Failed procedures notify errbit and can be retried
Jobs distinguish between procedures which have not yet started, completed or failed

Options for going forward:

Possible dimensions for options:

# of units of work (1 or many)
# of purposes job (1 or many)
Table to store data in between (events or delayed_jobs)

Option #1: without events

Remove the Events table and queue jobs for things that need to be processed. So let's say a package build fails, instead of

Creating an Event::BuildFail + create an Event::NotifyBackends job + create an ProjectLogRotate job (+ copy of Event) + create a SendEventMail job (+ copy of Event) ** a ) we create an BuildFailJob that notifies the backend, creates a ProjectLogEntry and sends an email ** b) we create BuildFailBackend job + BuildFailLogEntryJob + BuildFailEmail job

Option #2: events are processed in individual jobs

At Event creation we create individual jobs to send the event email, create the RSS entry, project log entry etc.

a) Data is stored in delayed job payload
b) Data is stored in events table and keeping track of jobs not finished yet (through associations? check if this is possible)

Option #3: batched events are processed in multipurpose jobs

Everything stays as it is and we make sure each class of job runs in its own queue and never concurrently

Jobs

Event::NotifyBackends

Requirements:

Is needed to be processing events continously

Target:

Posts the event payload to the backend, for events that define raw\_type attribute.
Only needed for the hermes and rabbitmq backend notification plugins.

Job Creation:

Clock.rb creates and queues a delayed job every 30 seconds.
[PROBLEM] This is using DelayedJob directly, not ActiveJob.

Processing control:

Uses boolean attribute events.queued to keep track of whether or not this has been processed.
[PROBLEM] queued is set to true before the payload is posted
[PROBLEM] Does not handle failures.
[PROBLEM] notify\_backend method is only defined on Event::Base class.

Concurrency control:

There is nothing to prevent this job running simultaneously, which is a problem because events can be processed more than one time and being sent to the backend.

ProjectLogRotate

Target:

It saves ProjectLogEntry entries to the database to create the RSS feed for the last commits in projects/packages
Should be created ASAP
It only needs project log entries to exist in the database for 10 days.

Job Creation

Clock.rb creates and enqueu a delayed job every 10 minutes

Processing control:

Uses the project_logged column.
[PROBLEM] Continuously retries events which raise an error when creating the ProjectLogEntry, or if anything else goes wrong (i.e. the project was already deleted).
[PROBLEM] If we reach 10,000 unprocessable events, then that would prevents the valid events from being processed, for 10 days.
[PROBLEM] Events which dont descend from Event::Project or Event::Package hang around for 10 days before they get marked as logged even though they are never used by ProjectLogRotate.

Concurrency control

Cannot run simultaneously with another instance of itself.
We prevent this by running all instances of this job in a single queue with a single worker.

CreateJob

Target:

CreateJob is base class, the subclasses called are: ** UpdateBackendInfos - Update frontend data based on what comes from the backend ** UpdateReleasedBinaries - Updates BinaryRelease data in frontend based on what comes from the backend

Job Creation:

DelayedJobs are queued inside perform\_create\_jobs callback in Event::Base model
Each job queued increments the undone_jobs counter
[PROBLEM] This is using DelayedJob directly, not ActiveJob.

Processing control:

uses undone_jobs (integer) column to keep track of how many delayed jobs still need to be completed
undone_jobs == 0 means that either there were no jobs to be processed, or they have already been processed
when a job completes it decrements undone_jobs counter by 1
[PROBLEM] both jobs do not handle exceptions or failures

Concurrency control:

CreateJob locks the event while updating undone_jobs after the job is completed
UpdateReleasedBinaries runs in 'releasetracking' queue so is not concurrent
UpdateBackendInfos runs in the 'quick' queue so is concurrent

SendEventEmails

Target:

Send emails ASAP for events to subscribers
Create RSS notifications ASAP for events

Job Creation:

Clock.rb creates and enqueu a delayed job every 30 seconds.

Processing control:

Uses boolean attribute events.mails\_sent to keep track of whether or not this has been processed.
[PROBLEM] create\_rss\_notifications fails silently.
[PROBLEM] It cannot distinguish between single failures in email sending and / or rss notification creation
If either email sending or rss creation fails: ** then Errbit is notified ** [PROBLEM] mails_sent is set to true to not re-process that event

Concurrency control:

cannot run simultaneously with another instance of itself.
we prevent this by running all instances of this job in a single queue with a single worker.

UpdateNotificationEvents

Target:

It is reading from the backend at /lastnotifications and creating ASAP events based on that response.

Job Creation:

Clock.rb runs this every 17 seconds inside a thread (because it was needed to run asynchronously).
[PROBLEM] The use of threads complicates the processing, a Mutex is used to avoid running multiple threads at the same time

Processing control:

Every run of this job stores the last notification id it looked at into the database (BackendInfo.lastnotification_nr)
Every run of this job fetches the notifications from BackendInfo.lastnotification_nr onwards
Every run of this jobs is blocking access to the the backend call??? (Clarify with the backend people what /lastnotifications?block=1 means)
Based on limit\_reached and next attributes of backend /lastnotifications response
limit\_reached set to 1 means that the backend have more events to notify (> 1000) but it can't be served in one request, so, it would mean that we need to request more from the backend. That will be done in another iteration of the loop.
sync=lost will be set if the notification id the job starts off, is lower than the oldest number on record in the backend (probably not needed anymore as concurrent proccesses are not possible anymore)

Event Subsystem Architecture Review

General Notes

JOBS

General Notes

Requirements for the new delayed job system:

Options for going forward:

Option #1: without events

Option #2: events are processed in individual jobs

Option #3: batched events are processed in multipurpose jobs

Jobs

Event::NotifyBackends

ProjectLogRotate

CreateJob

SendEventEmails

UpdateNotificationEvents

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!