Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable container to terminate cleanly #125

Closed
vorburger opened this issue Nov 26, 2022 · 3 comments · Fixed by #168
Closed

Enable container to terminate cleanly #125

vorburger opened this issue Nov 26, 2022 · 3 comments · Fixed by #168

Comments

@vorburger
Copy link
Contributor

Could it make sense to improve upon the current approach how to gracefully stop a node a not receive a penalty by enabling the container to terminate cleanly?

As-is currently, it keeps running after SIGTERM (so you signal it using e.g. docker kill, wait a fixed amount of time, and then terminate it with docker stop - which will SIGTERM again, wait another 10s, and then SIGKILL it.

If there was any way for it to figure out and know by itself when it's "done" and then exit, I have a hunch that this, together with #124, could contribute to making #32, #109 and #123 easier and upgrades faster. Then you could simply docker stop --time=900 but those 15' would then be an upper bound maximum, not a hard-coded fixed duration anymore.

I don't know enough exact details about the architecture yet, but I suspect that this at least one part of solving this includes "stopping / decelining to accept new HTTP requests, but waiting for Nginx to finish serving the ones it currently has going".

@DiegoRBaquero
Copy link
Collaborator

There's no way to do this with only inside-the-container knowledge, as DNS, even if set to 2 minutes TTL, might not be fully propagated after 15 mins, new requests will still be coming in, in less amount, but non-zero.

@vorburger
Copy link
Contributor Author

vorburger commented Dec 3, 2022

and then terminate it with docker stop - which will SIGTERM

This ^^^ technically isn't fully accurate actually: When we request to stop this project's container, whether with docker stop or some equivalent on some container orchestration platform, it (currently) actually immediately receives a SIGQUIT instead of the (default) SIGTERM - because the used Nginx base image changed the STOPSIGNAL.

new requests will still be coming in, in less amount, but non-zero.

Maybe this actually is less of an issue than I originally thought. I think the primary (interesting) overall goal probably is more "seemless (and fast) version upgrades" than "terminate cleanly for permanent shutdown" - and that may be possible by (reliably fully automated, TBD) "rolling" upgrades...

@vorburger
Copy link
Contributor Author

After having given this much further thought, I now think what is actually primarily missing to enable clean Rolling Updates is a way to signal the container to let Nginx finish serving ongoing requests (just to avoid clients experiencing "Connection reset by peer"), but WITHOUT "Draining server' by deregistering from the Orchestrator (because for a seamless fully rolling update you would actually NOT want to do that).

Based on what I've learnt so far, I doubt that is possible as-is today, given that the Shim handles SIGQUIT, that immediately exit(0), and probably doesn't "propagate" to Nginx. I haven't actually fully tested it yet, but suggest that be the next step on this issue, and fixing that (if needed).

AnomalRoil added a commit to AnomalRoil/L1-node that referenced this issue Jan 11, 2023
AnomalRoil added a commit to AnomalRoil/L1-node that referenced this issue Jan 11, 2023
AnomalRoil added a commit to AnomalRoil/L1-node that referenced this issue Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants