-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restore fails when bash exec present #11439
Comments
Yeah checkpointing a container which has live |
Yep! Maybe behind a flag. |
For even more fun here: $ runsc exec x bash
:/# dtach -n /tmp/z watch ps
:/# exit Then checkpoint and restore. The checkpoint and restore succeed. But the detached processes show up in ps as kernel processes, but are not actually executing.
|
Exec'ed processes are killed after the restore, because they cannot be properly stitched back to the session that created them in the first place. I believe what you're seeing are zombie processes, because there is no parent to reap them. As for the FD, the code needs to ignore errors to restore host mapped FDs that belong to exec'd tasks ( |
Fixes #11439 PiperOrigin-RevId: 727091752
The patch #11478 fixes the problem in my tests. Please double-check that it fixes the issue for you. |
Super exciting! I'm off next week, but will test it out on my return. Yay! |
Description
I am experimenting with gvisor's save/restore functionality to decide whether to build atop it. I immediately hit a stumbling block in which restore fails, without an apparent workaround.
If you exec bash, checkpoint, then restore, the restore fails due to FD issues.
I would hope that it would successfully restore, and close the FDs associated with the exec'd bash process that could not be restored. Or that there is some path to bringing the container back up, even if it is in a slightly broken state. (Shortcomings can be documented, worked around, built around; simple failure to restore cannot.)
(I am also left wondering what other surprises might be in store for me, given that I hit this one almost immediately, i.e. #3281)
Thanks!
Steps to reproduce
Terminal 1:
Terminal 2:
runsc exec x bash
Terminal 1:
This results in this failure:
The container is pretty vanilla alpine, running
sleep infinity
.runsc version
# runsc -version runsc version release-20250127.0 spec: 1.1.0-rc.1
uname
The text was updated successfully, but these errors were encountered: