-
Notifications
You must be signed in to change notification settings - Fork 109
Use async pid fd instead of blocking waitid to wait for a child process to exit #745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
41ef0a5
to
94d3674
Compare
94d3674
to
82758d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I still need to figure out why CI is failing. I think the http_poxy container is not receiving the SiGINT for some reason. But I don't see how that can relate to this. I'll keep digging tomorrow. |
29927c0
to
4a4f9b0
Compare
f29e21c
to
26d8ab0
Compare
92edd5b
to
e89619a
Compare
e89619a
to
1a900dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for working on this
What I'd like to see, perhaps as a follow up, is some unit tests for edge cases like
- What would happen if the pid does not exist
- what would happen if the orignal PID is reused by a new process after the first one exists.
- what would happen to call
wait()
when containerd-shim has already reaped the process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for working on this
What I'd like to see, perhaps as a follow up, is some unit tests for edge cases like
- What would happen if the pid does not exist
- what would happen if the orignal PID is reused by a new process after the first one exists.
- what would happen to call
wait()
when containerd-shim has already reaped the process
Just to clarify, in this PR, you did not resolve this issue, right? If yes, could you please raise an issue against the repo so that we can track this one. |
Signed-off-by: Jorge Prendes <[email protected]>
Signed-off-by: Jorge Prendes <[email protected]>
Signed-off-by: Jorge Prendes <[email protected]>
It doesn't resolve the underlying race condition, but it covers all possible outcomes, so it's not an issue for runwasi.
To be fair, we could just use the containerd_shim reaper alone, and we wouldn't need the pidfd part. |
During wait, it would hit the ECHILD branch, and then wait forever in the c8d_shim reaper. Fixed it so that we wait with a timeout.
That's not a problem.
As long as the struct is created before the process exits, that's not a problem. That's why we create the struct BEFORE calling |
1a900dc
to
fbeed28
Compare
…ss to exit Signed-off-by: Jorge Prendes <[email protected]>
Signed-off-by: Jorge Prendes <[email protected]>
Signed-off-by: Jorge Prendes <[email protected]>
fbeed28
to
d490d0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The merge-base changed after approval.
This PR replaces the blocking waitid call with an async implementation based on pid fd.
There's a race condition where containerd-shim reaps child processes.
If the child process has already been reaped, query containerd-shim to get the process status.
Note that this race condition is already present in the current implementation, but runwasi's waitid is very likely to win the race, the introduction of async evens out the odds between runwasi and containerd-shim.
This PR is in preparation to move the whole shim implementation to async.