-
Notifications
You must be signed in to change notification settings - Fork 900
Mystery error on exit #12607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm unaware of any limitation on number of concurrent mpiruns, but I don't really understand what you are trying to do. A far cleaner way of doing this would be to start the PRRTE DVM (just Setting that aside, all the output is telling you is that one of your processes didn't exit properly - likely failed to call |
I am not experienced in any of this, so its' not something that I know much about, but can look into it
Yes I can see that, the issue here is that from my point of view, it shouldn't be happening, and it only happens sometimes. With no real clear indication of what's happening or why Anyway, I haven't seen it since upgrading from fedora 39, to 40, so hopefully it's transient |
This may be related to #10117 ? |
No - totally unrelated unless you see your procs are crashing, which isn't what you report. It sounds to me like the issue is something in your integration with the OS if upgrading fedora solves the problem. I very much doubt it is something in OMPI causing you to exit improperly - that would almost always show as a segfault. |
Hi there,
I'm trying to execute a number of
mpirun -n 2 --bind-to none ...
processes at the same time on a somewhat high CPU count machine.I do not believe I'm running more processes than there are cores available, yet in some instances, which I can't reproduce locally, there seems to be an issue on process shutdown.
To me it looks like the process is finished, and that there is no issue, but there's still is reported an issue. With a non-zero exit code, which is the annoying part here.
Is there the potential for some issue from running a large number of concurrent, but unrelated mpirun processes?
I'm open to the fact that there might be an issue in the shutdown code we have, but we also do have some mpi barriers in place to encourage process lock-stepping
Error log.
The text was updated successfully, but these errors were encountered: