-
-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] moon run
doesn't wait for process to exit on Ctrl-C
#1678
Comments
Also applies to linux |
We abort threads in tokio: https://docs.rs/tokio/latest/tokio/task/index.html#cancellation I'm pretty sure this just kills the thread and doesn't actually pass SIGINT through. |
I assume it would be possible to attach a different behaviour on drop here moon/crates/process/src/command.rs Lines 97 to 100 in c6ea612
I'll look into this a little more. The main issue may be that drop is sync and you shouldn't block the runtime |
tokio-rs/tokio#2504 talks about this issue. Also proposes some solutions, though in this case the correct solution may require awaiting exit rather than assuming drop to do that for you |
Trying to fix this myself. The behaviour I am going for is it waiting for all processes to exit explicitly via SIGTERM. While keeping the SIGKILL on drop in case of panic unwind or any other unexpected drop. This will lead to it being stuck if the process doesn't respect SIGTERM. Adding a timeout of a random duration doesn't feel right either though. I'll worry about that later |
This will take a while, there are no crates that do this in a way that would work on windows and unix. I think I'll move my current logic into external crates since it's too generic to fit here |
Yeah, I've noticed the rust ecosystem doesn't have really good support for cross-platform signals, or ways to achieve this easily. |
I reworked this a bit in v1.31 so it should wait now for them to be cancelled. |
Testing on v1.31.1 (linux) it still fails for all signals
I used these options tasks: interactive_signal: I think I'll just use signal_hook despite it only somewhat handling windows, for linux I am aiming to get all signals working for windows I'll settle with SIGINT |
I'll send the reproduction in a bit |
https://github.com/JeremyMoeglich/moon_signal_reproduction |
I still don't pass the signals down, so that still won't work. But it should wait for tasks to finishing shutting down. |
That seems to not work, at least not for me small reproduction
|
I suspect this is the main piece of code that handles this // Since these tasks are persistent and never complete,
// we need to continually check if they've been aborted or
// cancelled, otherwise we will end up with zombie processes
loop {
sleep(Duration::from_millis(150)).await;
// No tasks running, so don't hang forever
if job_context.result_sender.is_closed() {
break;
}
if job_context.is_aborted_or_cancelled() {
debug!("Shutting down {} persistent jobs", job_handles.len());
job_handles.shutdown().await;
break;
}
} .shutdown() simply aborts all futures in job_handles. There is no such thing as a way to gracefully shutdown a future. |
Yah that's the end goal. I need to find a way to share the There's this: https://crates.io/crates/shared_child but it's Windows support seems non-existent. |
Looking into this here #1790 |
Testing the branch you've been working on it still seems to be killing the child process rather than waiting | SIGINT | Moon Persistent | |
That's odd since we explicitly wait so that process reaping can function correctly: https://github.com/moonrepo/moon/pull/1790/files#diff-7d3e0aaa6ba5077700059589b360158272c16327362b17ab251740ae9b9dfd27R76 I wonder if something else is killing it outside of this. I'll dig further. |
Looking at the code it does what I would do too. My testing code literally does the same thing to send the signals. |
I suspect I might have used the wrong branch, I can't really test it since my reproduction only works on linux and I am on windows right now. But I can at least test windows then |
Ok there is some interesting behavior on windows, it properly forwards the signal but moon does end early Before it would just exit without a status like this PS C:\Users\moeglich\dev\moon_signal_reproduction> moon run root:wait Now it does print a status and doesn't kill, but does end early ./moon.exe run root:wait Tasks: 1 skipped PS C:\Users\moeglich\dev\moon_signal_reproduction> SIGINT That means the SIGINT appears after moon has already exited. Code:
|
I have retested the version and I was indeed on the wrong branch, sorry for that. The two signals that matter most SIGINT and SIGTERM are forwarded correctly. You could use https://docs.rs/signal-hook-tokio/latest/signal_hook_tokio/ to handle the others on linux
|
I think I found part of the problem. I've been using this https://docs.rs/tokio/latest/tokio/task/struct.JoinSet.html#method.shutdown Which force aborts the threads, so rewriting parts of that code. |
Looks like process groups are stable, at least on Unix. Going to give that a try. Should make this much easier. |
Just stumbled upon this crate https://crates.io/crates/process-wrap Should solve most of these problems. |
Figured out why things are abruptly cancelled, it's because of https://docs.rs/tokio/latest/tokio/macro.select.html |
Give v1.32 a try. |
Don't think this works, from what I remember an older version of the 1.32-process branch did work, right now I am on linux.
Here is the full log
|
It's working according to the logs, but there's also a threshold (2 seconds) right now that will kill that processes if they take too long to shutdown. I can probably make this configurable. |
2 seconds should be fine for most things. I'll test windows in a bit |
Describe the bug
I have a persistent task which runs a program, this program has some exit logic that runs after
tokio::signal::ctrl_c().await.unwrap();
or in general after ctrl-cWhen I run it directly this logic runs, if I run via
moon run
it sometimes executes a little, but is always stopped before finishingSteps to reproduce
Exits immediately
Expected behavior
Should wait before exiting like it does when running directly
Environment
System:
OS: Windows 11 10.0.22631
CPU: (20) x64 13th Gen Intel(R) Core(TM) i5-13500T
Memory: 5.18 GB / 31.70 GB
Binaries:
Node: 20.15.1 - ~\AppData\Local\fnm_multishells\120760_1728461189958\node.EXE
npm: 10.7.0 - ~\AppData\Local\fnm_multishells\120760_1728461189958\npm.CMD
bun: 1.1.29 - ~.proto\shims\bun.EXE
Managers:
Cargo: 1.80.1 - ~.cargo\bin\cargo.EXE
pip3: 24.1.2 - ~.proto\shims\pip3.EXE
Utilities:
Git: 2.44.0. - C:\Program Files\Git\cmd\git.EXE
Clang: 19.1.0 - C:\Program Files\LLVM\bin\clang.EXE
Curl: 8.7.1 - C:\Windows\system32\curl.EXE
Virtualization:
Docker: 27.2.0 - C:\Program Files\Docker\Docker\resources\bin\docker.EXE
Docker Compose: 2.28.1 - C:\Program Files\Docker\Docker\resources\bin\docker-compose.EXE
IDEs:
VSCode: 0.41.3 - c:\Users\moeglich\AppData\Local\Programs\cursor\resources\app\bin\code.CMD
Languages:
Java: 21.0.4 - C:\Program Files\Eclipse Adoptium\jdk-21.0.4.7-hotspot\bin\javac.EXE
Python: 3.12.5 - C:\Users\moeglich.proto\shims\python.EXE
Python3: 3.12.5 - C:\Users\moeglich.proto\shims\python3.EXE
Rust: 1.80.1 - C:\Users\moeglich.cargo\bin\rustc.EXE
Databases:
PostgreSQL: 16.3 - C:\Program Files\PostgreSQL\16\bin\postgres.EXE
Browsers:
Edge: Chromium (128.0.2739.79)
Internet Explorer: 11.0.22621.3527
Additional context
I'll test this on linux later, right now I've only tested windows
The text was updated successfully, but these errors were encountered: