Description
TL;DR
When dealing with multi sub processes, tini is likely to return without waiting for all children to exit, and that drives kernel send SIGKILL to the remaining processes under the same pid namespace, going against the design principle for tini: allowing for signal forwarding and graceful termination.
Steps to Reproduce
-
get a tini binary
-
prepare a python script as followed:
# ppp.py
import os
import time
import signal
pid = os.fork()
if pid == 0:
signal.signal(15, lambda _, __: time.sleep(1))
cpid = os.fork()
time.sleep(1000)
Note that I use time.sleep(1)
to act as if there is some time consuming operation for graceful termination.
- make sure the script
ppp.py
and tini bintini
are in the $(pwd), then run a docker container as followed:
docker run -d --rm -v $(pwd):/src -w /src python /src/tini -g -s -- python /src/ppp.py
- inspect process tree
root 335889 1 0 Feb24 ? 00:05:21 /usr/bin/containerd
root 2707 335889 0 23:02 ? 00:00:00 \_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/7ea94ebea673367a6e31f136115dcbe5d0d18bc4c343ca5e834549f30ba7b189 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
root 2747 2707 0 23:02 ? 00:00:00 \_ /src/tini -g -s -- python /src/ppp.py
root 2810 2747 3 23:02 ? 00:00:00 \_ python /src/ppp.py
root 2840 2810 0 23:02 ? 00:00:00 \_ python /src/ppp.py
root 2841 2840 0 23:02 ? 00:00:00 \_ python /src/ppp.py
- use strace(1) to watch the last python process
2841
:strace -fTp 2841
, then send SIGTERM to tini:kill -15 2747
What We Expected
Since there is no multiple process group, and we specified the tini with -s -g
, the actions for tini should have been:
- forward SGTERM to all the python processes
- wait for all python processes to exit (graceful termination)
- exit as the last the process in the pid namespace
What We Got
However, we could simply observed that the second (2840) and the third (2841) python processes received SIGKILL as soon as SIGTERM came, demonstrating that tini didn't wait for their exits so that their graceful termination failed.
Root Cause and Suggestions
- Who sent the SIGKILL?
Kernel
If the "init" process of a PID namespace terminates, the kernel terminates all of the processes in the namespace via a SIGKILL signal.
- Why tini didn't wait?
Lines 546 to 560 in b9f42a0
Look at the second branchcase 0
, whose semantic in the waitpid(2) is:
waitpid(): on success, returns the process ID of the child whose state has changed; if WNOHANG was specified and one or more child(ren) specified by pid exist, but have not yet changed state, then 0 is returned.
So waitpid(-1, NOHANG)=0
means there is no "waitable" child(ren), but there IS child(ren), and that's when tini exits: with children still alive
-
Is this expected?
You might argue that, as long as the direct child process has decent behavior of handling SIGTERM, such as the first python2810
should wait for the second python2840
before quitting, the tini is flawless.
Well that's quite true, but provided that the direct child is capable of grappling with graceful termination and so on, what's the point of installing tini as pid 1 in the container? In that case, we should run the application process as pid 1, without tini. -
Improvement Suggestions
There are many ways to tackle the issue, and the principle is as simple as NOT exit until all children are gone.
The syscall waitpid(2) already offers us ability to distinguish between "there is no child" and "there is no waitable children", so we just follow the doc, and change the tini exit condition.