Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal broadcasting #173

Closed
wants to merge 1 commit into from

Conversation

psi-4ward
Copy link

My proposal to broadcast SIGHUP, SIGINT and SIGTERM to crond and borgmatic

My proposal to broadcast SIGHUP, SIGINT and SIGTERM to `crond` and `borgmatic`
@psi-4ward
Copy link
Author

Looks good.

borgmatic    | crond: crond (busybox 1.35.0) started, log level 8
borgmatic    | crond: USER root pid  23 cmd PATH=$PATH:/usr/local/bin /usr/local/bin/borgmatic --stats -v 0 2>&1
borgmatic    | ------------------------------------------------------------------------------
....
borgmatic    | ------------------------------------------------------------------------------
borgmatic    | Exiting due to TERM signal

@psi-4ward psi-4ward marked this pull request as ready for review October 28, 2022 09:26
@toastie89
Copy link
Contributor

@psi-4ward, thanks for your PR. May I ask what is the desired effect of the wait?

@psi-4ward
Copy link
Author

psi-4ward commented Oct 30, 2022

@psi-4ward, thanks for your PR. May I ask what is the desired effect of the wait?

wait without PIDs blocks the execution until all child background jobs are completed. Without wait the entry.sh would start crond in the background end exists. Cause entry.sh is our PID 1 (we don't replace it with exec ...) the container terminates when PID 1 exists.

@toastie89
Copy link
Contributor

@psi-4ward, thanks a lot for your explanation! 👍

@psi-4ward
Copy link
Author

This is how open source works 😉

@modem7
Copy link
Member

modem7 commented Oct 30, 2022

@psi-4ward Would adding tini be a simpler solution for this as it would deal with PIDs and sending/receiving signals within the container processes.

@psi-4ward
Copy link
Author

@psi-4ward Would adding tini be a simpler solution for this as it would deal with PIDs and sending/receiving signals within the container processes.

Afair forwards tini signals only to child-process (entry.sh)

My Test:

  • Create a sleep.sh which outputs received signals
  • run sleep.sh as background process using entry.sh
  • create the container with init: true
  • stop the container
    => sleep.sh gets killed and does not receive SIGTERM

@modem7
Copy link
Member

modem7 commented Oct 31, 2022

Ah, this could be resolved by having tini installed directly into the container which'd allow us greater control, as from my understanding the init implementation of Docker is rather simple.

E.g.

ARG TINI_VERSION='v0.19.0'
ADD --link --chmod=755 https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
SHELL ["/bin/bash", "-c"]
CMD ["/tini", "-g", "--", "/entry.sh"]

Note the -g flag.

With the -g option, Tini kills the child process group , so that every process in the group gets the signal. This corresponds more closely to what happens when you do ctrl-C etc. in a terminal: The signal is sent to the foreground process group.

@grantbevis
Copy link
Collaborator

I like that solution @modem7

@toastie89
Copy link
Contributor

@modem7, just read that there is a Alpine package for tini. Is there any reason to not install it from there?
@psi-4ward, would tini cover all you cases?

@modem7
Copy link
Member

modem7 commented Oct 31, 2022

@modem7, just read that there is a Alpine package for tini. Is there any reason to not install it from there? @psi-4ward, would tini cover all you cases?

Mostly so that we can control where it installs. I've tried installing it via the package manager and things get.......weird.

This would also allow us to control + (Docker build) cache which version we use (granted, hasn't been updated in some time, so probably not really much of an issue).

However - my above statement needs testing vs your question - if the apk version works as expected, then let's use that, it's a cleaner solution.

I'll do a quick test now.

@modem7
Copy link
Member

modem7 commented Oct 31, 2022

@toastie89 - I've done a test and it seems that the apk installed one is absolutely fine.

# syntax = docker/dockerfile:latest

FROM python:3.11.0-alpine3.16
LABEL mainainer='modem7'
RUN <<EOF
    set -x
    apk add --no-cache -U \
        tzdata \
        sshfs \
        openssl \
        fuse \
        ca-certificates \
        logrotate \
        lz4-libs \
        libacl \
        postgresql-client \
        mariadb-client \
        mongodb-tools \
        curl \
        findmnt \
        tini \
        bash \
        bash-completion \
        bash-doc
EOF

COPY --link requirements.txt /

RUN --mount=type=cache,id=pip,target=/root/.cache,sharing=locked python3 -m pip install -Ur requirements.txt
RUN <<EOF
    set -x
    borgmatic --bash-completion > /usr/share/bash-completion/completions/borgmatic
    echo "source /etc/profile.d/bash_completion.sh" > /root/.bashrc
EOF

COPY --chmod=755 --link entry.sh /
VOLUME /mnt/source
VOLUME /mnt/borg-repository
VOLUME /root/.borgmatic
VOLUME /etc/borgmatic.d
VOLUME /root/.config/borg
VOLUME /root/.ssh
VOLUME /root/.cache/borg

ENTRYPOINT ["/sbin/tini", "-g", "--", "/entry.sh"]

Which gives us:

UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                4026113             4026087             0                   19:24               pts/0               00:00:00            /sbin/tini -g -- /entry.sh
root                4026168             4026113             0                   19:24               pts/0               00:00:00            /bin/bash /entry.sh
root                4026333             4026168             0                   19:24               pts/0               00:00:00            crond -f -L /dev/stdout

@psi-4ward
Copy link
Author

Tested tini with -g:

  • Tini signals all processes in the group as expected
  • crond exists very quickly
  • sleep.sh (started by cron) receives the signal but needs another 5s to exit
    #!/bin/bash
    
    graceful_shutdown() {
     echo Sleeper received $1 ... sleeping another 5s
     sleep 5
     echo Sleeper will exit now
     exit 0
    }
    
    for s in SIGHUP SIGINT SIGTERM; do
     trap "graceful_shutdown $s" $s
    done
    
    while true
    do
       echo Sleeping
       sleep 10
    done
  • cause crond was the child of tini and exited with 0 tini exits (PID 1)
  • container exists and sleep.sh gets a SIGKILL

=> eventually running borgmatic process don't have time to do a graceful exit

See: krallin/tini#180

So tini seems not to be an option.

@modem7
Copy link
Member

modem7 commented Oct 31, 2022

See if https://github.com/Yelp/dumb-init would do it appropriately.

I haven't tried this solution myself, but apparently there's a few differences when it comes to stop signals.

But yes, the issue may be that the entrypoint script itself isn't sending suitable signals.

If we can have a repeatable testing method, we can test different potential solutions.

@psi-4ward
Copy link
Author

I think to remember that supervisord worked but it is long time ago since we played around with. At the end we used some bash traps cause it worked and we didn't have more dependencies

@modem7
Copy link
Member

modem7 commented Nov 1, 2022

A potentially dumb suggestion (as I don't quite have your test setup yet, so you'd be best to replicate):

In the entrypoint script, try replacing (alongside with having tini/init):

crond -f -L /dev/stdout with exec crond -f -L /dev/stdout - I'm wondering if because it's a shell script, just calling crond isn't doing what we expect.

Source: https://hynek.me/articles/docker-signals/ point number 2

The posts here: krallin/tini#72 and krallin/tini#78 seem to back this thought up.

The command docker top containername x --forest may also help diagnose a few things.


On another note @grantbevis, during testing, it seems that crond is using /bin/ash not /bin/bash.

We should investigate resolving this

(Simplest solution may be just to do RUN sed -i -e "s/bin\/ash/bin\/bash/" /etc/passwd in the Dockerfile seeing as how we've moved to a bash shell).

@psi-4ward
Copy link
Author

exec crond -f -L /dev/stdout replaces the current shell with executed process. So crond just would become PID 1 (and so would receive signals send by docker). I could not find anything on the web that crond would propagate signals to its childs.

@psi-4ward
Copy link
Author

On another note @grantbevis, during testing, it seems that crond is using /bin/ash not /bin/bash.

Think is doesn't matter. If one puts "scripts" in crontab he can still use /bin/bash -c "..." or just write an wrapper with correct shebang

@modem7
Copy link
Member

modem7 commented Nov 2, 2022

exec crond -f -L /dev/stdout replaces the current shell with executed process. So crond just would become PID 1 (and so would receive signals send by docker). I could not find anything on the web that crond would propagate signals to its childs.

From my brief test, it didn't seem to replace PID1 when execed with tini as tini called the shell, and the shell called cron, but I did a pretty basic test calling echo then sleep. Nothing overly fancy.

Now, the reason I suspect this is occurring on my end is that I'm calling tini as the entrypoint, then calling the script as the cmd which could potentially be a solution.

docker top borgtest x -o pid,command --forest
PID                 COMMAND
2057606             \_ /sbin/tini -g -- /entry.sh
2057743             \_ crond -f -L /dev/stdout
2060827             \_ sleep 10

@psi-4ward
Copy link
Author

Of course this is the expected result if u use tini.

exec replaces the shell and does not create a child process under the shell.

So case 1: no tini => entry.sh is the entrypoint (PID 1) and exec crond ... would replace entry.sh with crond which becomes PID 1

Case 2: tini => tini is the entrypoint (PID 1) and spawns entry.sh (PID 2). exec crond replaces the shell with PID 2 with crond

Hope I could make it clear now?

@modem7
Copy link
Member

modem7 commented Nov 2, 2022

Of course this is the expected result if u use tini.

exec replaces the shell and does not create a child process under the shell.

So case 1: no tini => entry.sh is the entrypoint (PID 1) and exec crond ... would replace entry.sh with crond which becomes PID 1

Case 2: tini => tini is the entrypoint (PID 1) and spawns entry.sh (PID 2). exec crond replaces the shell with PID 2 with crond

Hope I could make it clear now?

That's correct Yup.

So in the 2nd scenario, are you still getting the same behaviour or is it a potential solution?

Otherwise, do we need to investigate something such as calling a script the cron rather than calling the borgmatic executable directly via cron itself?

Docker by standard gives a SIGTERM signal as well, but there are ways we can change the signal if borgmatic requires as such.

@psi-4ward
Copy link
Author

So in the 2nd scenario, are you still getting the same behaviour or is it a potential solution?

Imho does not make any difference cause tini does not give the childs time to shutdown

Otherwise, do we need to investigate something such as calling a script the cron rather than calling the borgmatic executable directly via cron itself?

Does not make any difference cause cron does not forward signals to childs

@p-baum
Copy link

p-baum commented Nov 8, 2022

Is there a workaround we can implement now ourselves until this issue is resolved?

@modem7
Copy link
Member

modem7 commented Nov 8, 2022

Just an additional note, seems that this/a similar behaviour has also been picked up in the moby repo with init being admonished as a poor (read: non-working) solution: moby/moby#9098 (comment)

And a pending PR at: moby/moby#41548

@psi-4ward
Copy link
Author

Why do you dislike my solution? It's simple and works (for me 😄)

@modem7
Copy link
Member

modem7 commented Nov 9, 2022

Why do you dislike my solution? It's simple and works (for me 😄)

Heya, it's not that I dislike your particular solution, but it's more that it's a workaround (a good one at that, don't misunderstand).

If we can find a solution that actually deals with the issue at hand (preferably a process manager or Docker actually doing its job), then that would allow for additional work further down the line without having to refractor in the workarounds, especially as in many cases, it's not just borg(matic) that is running, but also database clients, filesystem packages, docker in docker etc (something that @toastie89 and @grantbevis would be better at elaborating on).

Ultimately, the final decision is up to the two chaps above as to what we do and how, as it's their baby. But I'm definitely not ignoring this as a potential solution to the problem.

Are you able to find a way to test if one solution or other works in a way that's reproduceable so that we can test multiple solutions? I'm somewhat swamped my end.

@grantbevis
Copy link
Collaborator

Can we simplify this and reiterate the exact problem we're trying to solve with this PR please?

I don't personally see the issue

@psi-4ward
Copy link
Author

Can we simplify this and reiterate the exact problem we're trying to solve with this PR please?

I don't personally see the issue

Okay, I'll try. If one stopps the container (docker stop borgmatic) the main-process (entry.sh) receives a SIGTERM signal.

In the current implementation entry.sh does not forward the signal to crond so crond does not know that he should exit. After grace-period docker kills (SIGTERM) entry.sh and reaps child processes in the cgroup => crond gets KILLED and has no chance to do some graceful shutdown.

Furthermore, crond could have started a borgmatic action so the process-list would look something like

(1)entry.sh
  +- (2)crond
    +- (3)borgmatic
       +(4)some-database-backup.sh

In this scenario not only crond gets killed but also borgmatic and some-database-backup.sh. borgmatic has no chance to do graceful actions like aborting the backup and release the lock.

Hopefully it gets clear?


Heya, it's not that I dislike your particular solution, but it's more that it's a workaround (a good one at that, don't misunderstand).

This is correct, it's a workaround but in fact that tini wont wait for child-process-exists tini is not an option.

Of course one could try s6-daemon-tools, initd, systemd or whatever bot I think it's over-bloated.

Your mention that we could/should also abort childs of borgmatic itself we could rewrite the signal-broadcasting to simply all process in the container and iterate (signal) it in descending order.


Reproduction setup:

Dockerfile

FROM python:3.10.6-alpine3.16
COPY *.sh /
RUN apk add --no-cache bash
CMD ["/entry.sh"]

entry.sh

#!/bin/bash
/parent.sh &
wait

parent.sh

#!/bin/bash
/sleeper.sh

sleeper.sh

#!/bin/bash

graceful_shutdown() {
  echo Sleeper received $1 ... sleeping another 5s
  sleep 5
  echo Sleeper will exit now
  exit 0
}

for s in SIGHUP SIGINT SIGTERM; do
  trap "graceful_shutdown $s" $s
done

while true
do
    echo Sleeping
    sleep 10
done

Without any further signal-broadcasting a docker stop of this container would result in a KILL and the exit-code 137

@toastie89
Copy link
Contributor

I would opt to go for the script prepared by @psi-4ward for now. We could still move to tini, in case it further devolops to solve the issue in future.

@modem7
Copy link
Member

modem7 commented Nov 9, 2022

As a side question, is Alpine cron really the best tool to use here I wonder?

*moved discussion to #177 as not to detract from/confuse this PR.

@psi-4ward
Copy link
Author

Tbh, usually one would use cron from the system or K8s jobs to trigger borgmatic inside the container. But I don't know how many ppl here use borg and don't have much possibilities (ie symbology Nas)

@psi-4ward
Copy link
Author

supercronic sounds like we should give it a try

@modem7 modem7 mentioned this pull request Nov 9, 2022
@grantbevis grantbevis added this to the supercronic milestone Dec 14, 2022
@grantbevis grantbevis linked an issue Dec 14, 2022 that may be closed by this pull request
@grantbevis
Copy link
Collaborator

Declining in favour of #186

@grantbevis grantbevis closed this Dec 14, 2022
@psi-4ward
Copy link
Author

So I'm sorry but for me this is not solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants