fix(server): Terminate on service panic #4249

jjbayer · 2024-11-14T11:22:31Z

Terminate the process when a panic occurs in one of the services.

Instead of implementing a sync fn spawn_handler() interface, each service now implements an async fn run() interface. This simplifies service implementations in most cases (except where services need to spawn multiple tasks), and allows us to join on all service main tasks to detect a panic.

The PR introduces a ServiceRunner utility to easily start & join on running services.

See also #4026 for an earlier attempt.

Note: The issue requires reporting "unhealthy" instead of terminating. I will take care of that in a follow-up PR.

ref: #4037

jjbayer · 2024-11-14T12:20:03Z

relay-system/src/service.rs

+            if let Err(e) = res {
+                if e.is_panic() {
+                    // Re-trigger panic to terminate the process:
+                    std::panic::resume_unwind(e.into_panic());


I confirmed that this terminates the process by inserting an explicit panic!() into one of the service bodies.

Mid-/Long-Term it would be nice if a service can specify it's desired behaviour, abort on panic/restart on panic.

loewenheim

If my understanding is correct, this essentially separates the concern of how a service is spawned out of the trait by replacing the spawn_handler method, which contained a tokio::spawn call, with a run method which is just an async function. Then you can start the service using either a ServiceRunner which keeps tabs on the services under its control—specifically, whether they panic—or just plain tokio::spawn (in the form of the start_detached method).

My one gripe with this is one of uniformity. In many places, services are started with runner.start(service), but e.g. ProjectCacheService::start takes the runner as a parameter. It might be nicer to do that everywhere, i.e. have a start method on Service which takes a runner as a parameter, but it's ultimately a sort of aesthetic concern.

loewenheim · 2024-11-14T13:44:32Z

relay-system/src/service.rs

+    pub fn start<S: Service>(&mut self, service: S) -> Addr<S::Interface> {
+        let (addr, rx) = channel(S::name());
+        self.spawn(service, rx);
+        addr
+    }
+
+    /// Starts a service and starts tracking its join handle, given a predefined receiver.
+    pub fn spawn<S: Service>(&mut self, service: S, rx: Receiver<S::Interface>) {


I'm a bit leery of the naming, start vs spawn isn't a very clear distinction. Can't think of anything better off the top of my head either, though.

How about start and start_with?

IMO something like start_with_rx or start_with_receiver would be even better.

relay-server/src/services/projects/source/mod.rs

Co-authored-by: Sebastian Zivota <[email protected]>

jjbayer · 2024-11-14T15:43:57Z

If my understanding is correct, this essentially separates the concern of how a service is spawned out of the trait by replacing the spawn_handler method, which contained a tokio::spawn call, with a run method which is just an async function. Then you can start the service using either a ServiceRunner which keeps tabs on the services under its control—specifically, whether they panic—or just plain tokio::spawn (in the form of the start_detached method).

Yes!

My one gripe with this is one of uniformity. In many places, services are started with runner.start(service), but e.g. ProjectCacheService::start takes the runner as a parameter. It might be nicer to do that everywhere, i.e. have a start method on Service which takes a runner as a parameter, but it's ultimately a sort of aesthetic concern.

@loewenheim this bugs me too, I will unify it to * Service::start(self, runner: &mut ServiceRunner)

This reverts commit d6bbdcc.

jjbayer · 2024-11-14T16:46:53Z

In many places, services are started with runner.start(service), but e.g. ProjectCacheService::start takes the runner as a parameter. It might be nicer to do that everywhere, i.e. have a start method on Service which takes a runner as a parameter, but it's ultimately a sort of aesthetic concern.

@loewenheim this bugs me too, I will unify it to * Service::start(self, runner: &mut ServiceRunner)

After trying this out, I figured it's actually better to keep the interface of Service narrow and only give the types that implement Service a helper method when they need it. I know consistently named these helpers start_in to clarify that they invert control.

classDiagram
    class ServiceRunner {
        +start(service)
    }

    ServiceRunner --> "starts" Service
    
    <<Interface>> Service
    class Service {
        async +run()
    }

    class ServiceWithStartHelper {
        +start_in(runner)
    }

    Service "implements" <|-- ServiceWithStartHelper

    ServiceWithStartHelper --> "calls" ServiceRunner

This reverts commit d0b4af9.

Dav1dde · 2024-11-15T12:12:03Z

relay-system/src/service.rs

+///
+/// Exposes information about crashed services.
+#[derive(Debug, Default)]
+pub struct ServiceRunner(FuturesUnordered<JoinHandle<()>>);


Tokio has a JoinSet for awaiting/spawning tasks, can use that instead of FuturesUnordered.

Dav1dde · 2024-11-15T12:13:03Z

relay-system/src/service.rs

+            if let Err(e) = res {
+                if e.is_panic() {
+                    // Re-trigger panic to terminate the process:
+                    std::panic::resume_unwind(e.into_panic());


Mid-/Long-Term it would be nice if a service can specify it's desired behaviour, abort on panic/restart on panic.

jjbayer added 10 commits November 13, 2024 15:13

rm start_in

3fab72e

wip: easy cases

ff88466

spawn

45f580a

clean

4b2c4f9

wip: service runner

20e403c

update usage

d246af0

fix remaining 2

ffa83ca

lint

97cd873

Merge remote-tracking branch 'origin/master' into joris/join

ced514b

doc

a9a5b6d

jjbayer changed the title ~~fix(server): Crash on panic~~ fix(server): Terminate on service panic Nov 14, 2024

jjbayer commented Nov 14, 2024

View reviewed changes

lint

0a28e44

jjbayer marked this pull request as ready for review November 14, 2024 12:33

jjbayer requested a review from a team as a code owner November 14, 2024 12:33

health check

d6bbdcc

jjbayer mentioned this pull request Nov 14, 2024

feat(server): Report unhealthy instead of terminating on panic #4250

Merged

changelog

d0b4af9

loewenheim reviewed Nov 14, 2024

View reviewed changes

Update relay-server/src/services/projects/source/mod.rs

e885bda

Co-authored-by: Sebastian Zivota <[email protected]>

jjbayer added 6 commits November 14, 2024 17:01

start_with

2260462

naming

61a8c14

Merge remote-tracking branch 'origin/master' into joris/panic-unhealthy

5803f9e

merge

ad28282

Revert "health check"

82729a8

This reverts commit d6bbdcc.

Merge remote-tracking branch 'origin/joris/join' into joris/join

f92a26d

push

740bc33

jjbayer requested a review from loewenheim November 15, 2024 10:21

jjbayer mentioned this pull request Nov 15, 2024

feat(server): Report unhealthy instead of terminating on panic #4255

Draft

Revert "changelog"

d46f370

This reverts commit d0b4af9.

Dav1dde reviewed Nov 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): Terminate on service panic #4249

fix(server): Terminate on service panic #4249

jjbayer commented Nov 14, 2024 •

edited

Loading

jjbayer Nov 14, 2024

Dav1dde Nov 15, 2024

loewenheim left a comment •

edited

Loading

loewenheim Nov 14, 2024

jjbayer Nov 14, 2024

loewenheim Nov 14, 2024

jjbayer commented Nov 14, 2024

jjbayer commented Nov 14, 2024

Dav1dde Nov 15, 2024

Dav1dde Nov 15, 2024

fix(server): Terminate on service panic #4249

Are you sure you want to change the base?

fix(server): Terminate on service panic #4249

Conversation

jjbayer commented Nov 14, 2024 • edited Loading

jjbayer Nov 14, 2024

Choose a reason for hiding this comment

Dav1dde Nov 15, 2024

Choose a reason for hiding this comment

loewenheim left a comment • edited Loading

Choose a reason for hiding this comment

loewenheim Nov 14, 2024

Choose a reason for hiding this comment

jjbayer Nov 14, 2024

Choose a reason for hiding this comment

loewenheim Nov 14, 2024

Choose a reason for hiding this comment

jjbayer commented Nov 14, 2024

jjbayer commented Nov 14, 2024

Dav1dde Nov 15, 2024

Choose a reason for hiding this comment

Dav1dde Nov 15, 2024

Choose a reason for hiding this comment

jjbayer commented Nov 14, 2024 •

edited

Loading

loewenheim left a comment •

edited

Loading