Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(server): Terminate on service panic #4249

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

fix(server): Terminate on service panic #4249

wants to merge 22 commits into from

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Nov 14, 2024

Terminate the process when a panic occurs in one of the services.

Instead of implementing a sync fn spawn_handler() interface, each service now implements an async fn run() interface. This simplifies service implementations in most cases (except where services need to spawn multiple tasks), and allows us to join on all service main tasks to detect a panic.

The PR introduces a ServiceRunner utility to easily start & join on running services.

See also #4026 for an earlier attempt.

Note: The issue requires reporting "unhealthy" instead of terminating. I will take care of that in a follow-up PR.

ref: #4037

@jjbayer jjbayer changed the title fix(server): Crash on panic fix(server): Terminate on service panic Nov 14, 2024
if let Err(e) = res {
if e.is_panic() {
// Re-trigger panic to terminate the process:
std::panic::resume_unwind(e.into_panic());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that this terminates the process by inserting an explicit panic!() into one of the service bodies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mid-/Long-Term it would be nice if a service can specify it's desired behaviour, abort on panic/restart on panic.

@jjbayer jjbayer marked this pull request as ready for review November 14, 2024 12:33
@jjbayer jjbayer requested a review from a team as a code owner November 14, 2024 12:33
Copy link
Contributor

@loewenheim loewenheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is correct, this essentially separates the concern of how a service is spawned out of the trait by replacing the spawn_handler method, which contained a tokio::spawn call, with a run method which is just an async function. Then you can start the service using either a ServiceRunner which keeps tabs on the services under its control—specifically, whether they panic—or just plain tokio::spawn (in the form of the start_detached method).

My one gripe with this is one of uniformity. In many places, services are started with runner.start(service), but e.g. ProjectCacheService::start takes the runner as a parameter. It might be nicer to do that everywhere, i.e. have a start method on Service which takes a runner as a parameter, but it's ultimately a sort of aesthetic concern.

Comment on lines 1049 to 1056
pub fn start<S: Service>(&mut self, service: S) -> Addr<S::Interface> {
let (addr, rx) = channel(S::name());
self.spawn(service, rx);
addr
}

/// Starts a service and starts tracking its join handle, given a predefined receiver.
pub fn spawn<S: Service>(&mut self, service: S, rx: Receiver<S::Interface>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit leery of the naming, start vs spawn isn't a very clear distinction. Can't think of anything better off the top of my head either, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about start and start_with?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO something like start_with_rx or start_with_receiver would be even better.

relay-server/src/services/projects/source/mod.rs Outdated Show resolved Hide resolved
@jjbayer
Copy link
Member Author

jjbayer commented Nov 14, 2024

If my understanding is correct, this essentially separates the concern of how a service is spawned out of the trait by replacing the spawn_handler method, which contained a tokio::spawn call, with a run method which is just an async function. Then you can start the service using either a ServiceRunner which keeps tabs on the services under its control—specifically, whether they panic—or just plain tokio::spawn (in the form of the start_detached method).

Yes!

My one gripe with this is one of uniformity. In many places, services are started with runner.start(service), but e.g. ProjectCacheService::start takes the runner as a parameter. It might be nicer to do that everywhere, i.e. have a start method on Service which takes a runner as a parameter, but it's ultimately a sort of aesthetic concern.

@loewenheim this bugs me too, I will unify it to * Service::start(self, runner: &mut ServiceRunner)

@jjbayer
Copy link
Member Author

jjbayer commented Nov 14, 2024

In many places, services are started with runner.start(service), but e.g. ProjectCacheService::start takes the runner as a parameter. It might be nicer to do that everywhere, i.e. have a start method on Service which takes a runner as a parameter, but it's ultimately a sort of aesthetic concern.

@loewenheim this bugs me too, I will unify it to * Service::start(self, runner: &mut ServiceRunner)

After trying this out, I figured it's actually better to keep the interface of Service narrow and only give the types that implement Service a helper method when they need it. I know consistently named these helpers start_in to clarify that they invert control.

classDiagram
    class ServiceRunner {
        +start(service)
    }

    ServiceRunner --> "starts" Service
    
    <<Interface>> Service
    class Service {
        async +run()
    }

    class ServiceWithStartHelper {
        +start_in(runner)
    }

    Service "implements" <|-- ServiceWithStartHelper

    ServiceWithStartHelper --> "calls" ServiceRunner
Loading

This reverts commit d0b4af9.
///
/// Exposes information about crashed services.
#[derive(Debug, Default)]
pub struct ServiceRunner(FuturesUnordered<JoinHandle<()>>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tokio has a JoinSet for awaiting/spawning tasks, can use that instead of FuturesUnordered.

if let Err(e) = res {
if e.is_panic() {
// Re-trigger panic to terminate the process:
std::panic::resume_unwind(e.into_panic());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mid-/Long-Term it would be nice if a service can specify it's desired behaviour, abort on panic/restart on panic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants