Use async pid fd instead of blocking waitid to wait for a child process to exit#745
Use async pid fd instead of blocking waitid to wait for a child process to exit#745Mossaka merged 6 commits intocontainerd:mainfrom
Conversation
41ef0a5 to
94d3674
Compare
94d3674 to
82758d2
Compare
|
I still need to figure out why CI is failing. I think the http_poxy container is not receiving the SiGINT for some reason. But I don't see how that can relate to this. I'll keep digging tomorrow. |
29927c0 to
4a4f9b0
Compare
f29e21c to
26d8ab0
Compare
92edd5b to
e89619a
Compare
e89619a to
1a900dc
Compare
Mossaka
left a comment
There was a problem hiding this comment.
Great! Thanks for working on this
What I'd like to see, perhaps as a follow up, is some unit tests for edge cases like
- What would happen if the pid does not exist
- what would happen if the orignal PID is reused by a new process after the first one exists.
- what would happen to call
wait()when containerd-shim has already reaped the process
Mossaka
left a comment
There was a problem hiding this comment.
Great! Thanks for working on this
What I'd like to see, perhaps as a follow up, is some unit tests for edge cases like
- What would happen if the pid does not exist
- what would happen if the orignal PID is reused by a new process after the first one exists.
- what would happen to call
wait()when containerd-shim has already reaped the process
Just to clarify, in this PR, you did not resolve this issue, right? If yes, could you please raise an issue against the repo so that we can track this one. |
Signed-off-by: Jorge Prendes <jorge.prendes@gmail.com>
Signed-off-by: Jorge Prendes <jorge.prendes@gmail.com>
Signed-off-by: Jorge Prendes <jorge.prendes@gmail.com>
It doesn't resolve the underlying race condition, but it covers all possible outcomes, so it's not an issue for runwasi.
To be fair, we could just use the containerd_shim reaper alone, and we wouldn't need the pidfd part. |
During wait, it would hit the ECHILD branch, and then wait forever in the c8d_shim reaper. Fixed it so that we wait with a timeout.
That's not a problem.
As long as the struct is created before the process exits, that's not a problem. That's why we create the struct BEFORE calling |
1a900dc to
fbeed28
Compare
…ss to exit Signed-off-by: Jorge Prendes <jorge.prendes@gmail.com>
Signed-off-by: Jorge Prendes <jorge.prendes@gmail.com>
Signed-off-by: Jorge Prendes <jorge.prendes@gmail.com>
fbeed28 to
d490d0d
Compare
The merge-base changed after approval.
This PR replaces the blocking waitid call with an async implementation based on pid fd.
There's a race condition where containerd-shim reaps child processes.
If the child process has already been reaped, query containerd-shim to get the process status.
Note that this race condition is already present in the current implementation, but runwasi's waitid is very likely to win the race, the introduction of async evens out the odds between runwasi and containerd-shim.
This PR is in preparation to move the whole shim implementation to async.