Hang on loading meshing on INL HPC with multiple nodes #646
Replies: 6 comments 7 replies
-
Have you tried other Modular Component Architecture? On my laptop (and some other ubuntu 2204 servers and a mac), the same location triggers a MPI_Win_Create error but I can bypass it via either ucx or rdma.
|
Beta Was this translation helpful? Give feedback.
-
I have not. I ll take a look. Not sure this is an option for automated testing either but I ll ask |
Beta Was this translation helpful? Give feedback.
-
You simply add it into the submit script. Or, you can also set it via an environment variable e.g.
|
Beta Was this translation helpful? Give feedback.
-
No luck with ucx either; littered with:
This is UCX 1.18.1 |
Beta Was this translation helpful? Give feedback.
-
Does your desired OMPI module support ucx? check I'd also suggest testing a smaller example regarding
where $ cat t.cpp
|
Beta Was this translation helpful? Give feedback.
-
Can you please try to run the code with:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
The simulation of the PB-FHR reflector using Nek, through Cardinal, are hanging on INL HPC when reading the mesh. This is not the only failing input but it's the one I'm working on. David Reger (@regerdavid) might have more.
This happens with this version of OpenMPI (other packages for completeness), which is the one used by MOOSE for all applications
It can run successfully with these other modules, but we don't want to have a Cardinal-specific openmpi version, too costly maintenance-wise.
To Reproduce
Run this case from the NRIC virtual test bed on one of INL's cluster (bitterroot for example)
https://github.com/idaholab/virtual_test_bed/tree/devel/pbfhr/mark1/reflector
Ping me for access to HPC if needed
Expected behavior
Load the mesh, run the case in ~20 minutes with 240 CPUs
Desktop (please complete the following information):
https://github.com/neams-th-coe/nekRS/tree/48e408eb9a1f4de674efd243a982c84300c792e4
Additional context
@loganharbour @aprilnovak for awareness
Needed for VTB HPC testing. You can use the VTB charge number with @eshemon 's permission at ANL
Beta Was this translation helpful? Give feedback.
All reactions