-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward signals to child process with micromamba run #3152
base: main
Are you sure you want to change the base?
Conversation
Do have more understanding of why signal forwarding works on Linux and macOS but not Windows? Should this patch be Windows-only instead? |
My changes don't touch windows at all (almost all changes are inside a |
The context where I encountered this as an issue is a setup involving SLURM and pytorch-lightning. There I needed the python interpreter to receive the SIGUSR1 signal, but, whenever I ran it as While working on the issue I got some inspiration from the source code of the Singularity/Apptainer containerization tool, since |
Thanks a lot for the context! We definitely need to add tests for this feature. I guess it could be as simple as using That could also help understanding the Windows status quo :) |
We save and restore the signal mask and the signal handlers, as well as stop and restart our `default_signal_handler`.
Hi @jonashaag! I made the code more robust and added unit tests for signal forwarding and process suspension (with CTRL-Z). Unfortunately the code is using a lot of Linux syscalls, so not even the unit tests are usable for windows. Could you take a look at it and maybe tag other people who have more knowledge of signal handling or linux processes? In particular, I'm not sure if the way I'm dealing with the previous signal handler and the signal mask is correct, and if I'm waiting correctly for the child process. Also, it might make sense to move some of the new code to separate functions to keep things tidy. On the plus side, I already tested the binary in my setup and it works for my use case! 😀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work <3
libmamba/src/core/run.cpp
Outdated
} | ||
} | ||
|
||
// Compute the return status mimicking the private logic in reproc++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a link to that logic as a comment please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -442,24 +477,38 @@ namespace mamba | |||
} | |||
|
|||
#ifndef _WIN32 | |||
MainExecutor::instance().schedule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I forgot this change from one of the experiments I did. I'll revert it.
Do you know why the signal handler was being set with the MainExecutor
instead of directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I traced this back to 60d4b3a, but still don't know why the signal was being set in a separate thread.
With the new changes, it seems more difficult to do the modifications correctly in a separate thread. The MainExecutor doesn't seem to have functionality to check if a scheduled task has finished, meaning that it wouldn't be possible to check that modifying and restoring the signal masks happen before the end of the run_in_environment
call without using some sort of synchronization primitives. At the same time, the signal mask is set on a per-thread basis, so the calls to pthread_sigmask
would have to anyhow be done on the main thread. What do you think about leaving this as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wolfv do you recall why?
@@ -470,12 +519,23 @@ namespace mamba | |||
|
|||
ec = reproc::drain(proc, reproc::sink::null, reproc::sink::null); | |||
|
|||
std::tie(status, ec) = proc.stop(opt.stop); | |||
#ifndef _WIN32 | |||
status = wait_for_process(pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between the place where we installed the signal handlers up until after this line, is it possible to get any exceptions? If so, should we put the cleanup/restoring code into a finally
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not check, but that sounds like a good idea. Unfortunately I'm not sure how to do that in C++
- add link to relevant reproc code - move Unix-only code into `#ifndef _WIN32` block - fix comment - extend suspension unit test sleep time
// Compute the return status mimicing the private logic in reproc++ | ||
// https://github.com/DaanDeMeyer/reproc/blob/1c07bdbec3f2ecba7125b9499b9a8a77bf9aa8c7/reproc/src/process.posix.c#L455 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Compute the return status mimicing the private logic in reproc++ | |
// https://github.com/DaanDeMeyer/reproc/blob/1c07bdbec3f2ecba7125b9499b9a8a77bf9aa8c7/reproc/src/process.posix.c#L455 | |
// Compute the return status mimicing the private logic in reproc++: | |
// https://github.com/DaanDeMeyer/reproc/blob/1c07bdbec3f2ecba7125b9499b9a8a77bf9aa8c7/reproc/src/process.posix.c#L471 | |
// https://github.com/DaanDeMeyer/reproc/blob/1c07bdbec3f2ecba7125b9499b9a8a77bf9aa8c7/reproc/src/process.posix.c#L455 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or
// Compute the return status mimicing the private logic in reproc++ | |
// https://github.com/DaanDeMeyer/reproc/blob/1c07bdbec3f2ecba7125b9499b9a8a77bf9aa8c7/reproc/src/process.posix.c#L455 | |
// Compute the return status mimicing the private logic in reproc++ | |
// https://github.com/DaanDeMeyer/reproc/blob/1c07bdbec3f2ecba7125b9499b9a8a77bf9aa8c7/reproc/src/process.posix.c#L465-L478 |
What signal handler do we have installed in this loop?
Ie. what are the previous handlers? If this always the fixed list from |
When used in our code, I think only the so-called default handler is set. If, on the other hand, Also, there's actually one corner case which I don't handle and which seems impossible to handle without reworking other parts of the code. If someone uses The so-called signal handlers from |
It seems that |
I don't know much about the signal handlers but I know they must be forwarded to python when used in |
Hi! This MR tries to fix issue #1820, in which signals are not forwarded to the child process. I think this should count as a first draft - in the final version it's probably a good idea to save the state of the signal handlers and restore it once the call is over, instead of just overwriting them. Looking forward to your feedback!
P.S.: I don't know much about Windows, so I didn't even try to handle that case.