Allow supervisor to recover after crash #519
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a first pass at a fix for #512: allowing Solid Queue to recover if the database goes offline (or if it fails for any other reason).
In the case of the database going away, there are a few possible scenarios that can cause the supervisor to fail, but the most common is:
after_shutdown
inRegistrable
) and failsProcess#deregister
re-raises any exceptions that come up during deregistration, so worker crashesProcess
so when it fails, it callsderegister
just like the worker didderegister
so Solid Queue terminates completelyAfter a restart, the maintenance tasks performed by the supervisor do a good job of cleaning up the loose ends left behind, so it seemed like the cleanest approach was just to let the supervisor crash, then spin up a new instance. This is handled by a new
Launcher
class that wrapsSupervisor#start
in a retry block with exponential backoff.This also adds a new config parameter
max_restart_attempts
that allows the user to limit the number of restart attempts. Ifnil
, it will retry forever, and if0
, it won't try at all. (I made0
the default since that's the current behavior.)I tested with Postgres and MySQL, but didn't really know how to test SQLite or if it even made sense to. Again, this is just a first attempt - happy to try a different approach if this doesn't seem quite right.
Thanks!