-
Notifications
You must be signed in to change notification settings - Fork 461
Event Subsystem Architecture Review
This review of Events / notifications and the respective delayed jobs was conducted in July 2017 by @hennevogel, @mdeniz and @evanrolfe.
- The relationship between events and subscriptions is a complex service class and the logic only works one way. You can only find subscriptions for an event, not the other way round)
- Event's data is duplicated for Notifications and ProjecLogEntry instances as the payload
- None of these jobs can track failures.
- Its assumed that every job will succeed.
- ActiveJob and DelayedJob use different default queues
- Jobs shouldn't expose methods beside
perform
- Jobs of the same type do not run concurrently
- Failed procedures notify errbit and can be retried
- Jobs distinguish between procedures which have not yet started, completed or failed
Possible dimensions for options:
- # of units of work (1 or many)
- # of purposes job (1 or many)
- Table to store data in between (events or delayed_jobs)
Remove the Events table and queue jobs for things that need to be processed. So let's say a package build fails, instead of
- Creating an Event::BuildFail + create an Event::NotifyBackends job + create an ProjectLogRotate job (+ copy of Event) + create a SendEventMail job (+ copy of Event) ** a ) we create an BuildFailJob that notifies the backend, creates a ProjectLogEntry and sends an email ** b) we create BuildFailBackend job + BuildFailLogEntryJob + BuildFailEmail job
At Event creation we create individual jobs to send the event email, create the RSS entry, project log entry etc.
- a) Data is stored in delayed job payload
- b) Data is stored in events table and keeping track of jobs not finished yet (through associations? check if this is possible)
Everything stays as it is and we make sure each class of job runs in its own queue and never concurrently
Requirements:
- Is needed to be processing events continously
Target:
- Posts the event payload to the backend, for events that define
raw\_typeattribute. - Only needed for the hermes and rabbitmq backend notification plugins.
Job Creation:
- Clock.rb creates and queues a delayed job every 30 seconds.
- [PROBLEM] This is using DelayedJob directly, not ActiveJob.
Processing control:
- Uses boolean attribute
events.queuedto keep track of whether or not this has been processed. - [PROBLEM] queued is set to true before the payload is posted
- [PROBLEM] Does not handle failures.
- [PROBLEM]
notify\_backendmethod is only defined on Event::Base class.
Concurrency control:
- There is nothing to prevent this job running simultaneously, which is a problem because events can be processed more than one time and being sent to the backend.
Target:
- It saves ProjectLogEntry entries to the database to create the RSS feed for the last commits in projects/packages
- Should be created ASAP
- It only needs project log entries to exist in the database for 10 days.
Job Creation
- Clock.rb creates and enqueu a delayed job every 10 minutes
Processing control:
- Uses the project_logged column.
- [PROBLEM] Continuously retries events which raise an error when creating the ProjectLogEntry, or if anything else goes wrong (i.e. the project was already deleted).
- [PROBLEM] If we reach 10,000 unprocessable events, then that would prevents the valid events from being processed, for 10 days.
- [PROBLEM] Events which dont descend from Event::Project or Event::Package hang around for 10 days before they get marked as logged even though they are never used by ProjectLogRotate.
Concurrency control
- Cannot run simultaneously with another instance of itself.
- We prevent this by running all instances of this job in a single queue with a single worker.
Target:
- CreateJob is base class, the subclasses called are: ** UpdateBackendInfos - Update frontend data based on what comes from the backend ** UpdateReleasedBinaries - Updates BinaryRelease data in frontend based on what comes from the backend
Job Creation:
- DelayedJobs are queued inside
perform\_create\_jobscallback in Event::Base model - Each job queued increments the undone_jobs counter
- [PROBLEM] This is using DelayedJob directly, not ActiveJob.
Processing control:
- uses undone_jobs (integer) column to keep track of how many delayed jobs still need to be completed
- undone_jobs == 0 means that either there were no jobs to be processed, or they have already been processed
- when a job completes it decrements undone_jobs counter by 1
- [PROBLEM] both jobs do not handle exceptions or failures
Concurrency control:
- CreateJob locks the event while updating undone_jobs after the job is completed
- UpdateReleasedBinaries runs in 'releasetracking' queue so is not concurrent
- UpdateBackendInfos runs in the 'quick' queue so is concurrent
Target:
- Send emails ASAP for events to subscribers
- Create RSS notifications ASAP for events
Job Creation:
- Clock.rb creates and enqueu a delayed job every 30 seconds.
Processing control:
- Uses boolean attribute
events.mails\_sentto keep track of whether or not this has been processed. - [PROBLEM]
create\_rss\_notificationsfails silently. - [PROBLEM] It cannot distinguish between single failures in email sending and / or rss notification creation
- If either email sending or rss creation fails: ** then Errbit is notified ** [PROBLEM] mails_sent is set to true to not re-process that event
Concurrency control:
- cannot run simultaneously with another instance of itself.
- we prevent this by running all instances of this job in a single queue with a single worker.
Target:
- It is reading from the backend at /lastnotifications and creating ASAP events based on that response.
Job Creation:
- Clock.rb runs this every 17 seconds inside a thread (because it was needed to run asynchronously).
- [PROBLEM] The use of threads complicates the processing, a Mutex is used to avoid running multiple threads at the same time
Processing control:
- Every run of this job stores the last notification id it looked at into the database (BackendInfo.lastnotification_nr)
- Every run of this job fetches the notifications from BackendInfo.lastnotification_nr onwards
- Every run of this jobs is blocking access to the the backend call??? (Clarify with the backend people what /lastnotifications?block=1 means)
- Based on
limit\_reachedandnextattributes of backend /lastnotifications response -
limit\_reachedset to 1 means that the backend have more events to notify (> 1000) but it can't be served in one request, so, it would mean that we need to request more from the backend. That will be done in another iteration of the loop. -
sync=lostwill be set if the notification id the job starts off, is lower than the oldest number on record in the backend (probably not needed anymore as concurrent proccesses are not possible anymore)
Concurrency control:
- cannot run simultaneously with another instance of itself.
- [PROBLEM] we prevent this by using a semaphore/Mutex.
- Development Environment Overview
- Development Environment Tips & Tricks
- Spec-Tips
- Code Style
- Rubocop
- Testing with VCR
- Test in kanku
- Authentication
- Authorization
- Autocomplete
- BS Requests
- Events
- ProjectLog
- Notifications
- Feature Toggles
- Build Results
- Attrib classes
- Flags
- The BackendPackage Cache
- Maintenance classes
- Cloud uploader
- Delayed Jobs
- Staging Workflow
- StatusHistory
- OBS API
- Owner Search
- Search
- Links
- Distributions
- Repository
- Data Migrations
- Package Versions
- next_rails
- Ruby Update
- Rails Profiling
- Remote Pairing Setup Guide
- Factory Dashboard
- osc
- Setup an OBS Development Environment on macOS
- Run OpenQA smoketest locally
- Responsive Guidelines
- Importing database dumps
- Problem Statement & Solution
- Kickoff New Stuff
- New Swagger API doc
- Documentation and Communication
- GitHub Actions
- Brakeman
- How to Introduce Software Design Patterns
- Query Objects
- Services
- View Components
- RFC: Core Components
- RFC: Decorator Pattern
- RFC: Backend models
- RFC: Hotwire Turbo Frames Pattern