-
Notifications
You must be signed in to change notification settings - Fork 911
Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
I'm compiling from main branch, this is the commit that appears in the configure log:
*** Checking versions
checking for repo version... cfb5505
checking Open MPI version... 5.1.0a1
checking Open MPI release date... Unreleased developer copy
checking Open MPI repository version... cfb5505
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
I've installed with spack v0.23
spack install openmpi@main
configure '--disable-option-checking' '--prefix=/home/gavaf/spack/opt/linux-rocky8-x86_64/gcc-11.3.0/openmpi/main-6ksrvxfwdiezibx65yoxwbkw5xuywryw' --enable-prte-ft --with-proxy-version-string=5.1.0a1 --with-proxy-package-name="Open MPI" --with-proxy-bugreport="https://www.open-mpi.org/community/help/" --disable-devel-check --enable-prte-prefix-by-default --disable-pmix-lib-checks --with-pmix-extra-libs="/cfdbuild/gavaf/spack-stage/spack-stage-openmpi-main-6ksrvxfwdiezibx65yoxwbkw5xuywryw/spack-src/3rd-party/openpmix/src/libpmix.la" '--enable-shared' '--disable-silent-rules' '--disable-sphinx' '--enable-builtin-atomics' '--disable-static' '--enable-mpi1-compatibility' '--with-ucx=/usr' '--without-psm2' '--without-xpmem' '--without-cma' '--without-ofi' '--without-verbs' '--without-mxm' '--without-psm' '--without-ucc' '--without-knem' '--without-hcoll' '--without-fca' '--without-cray-xpmem' '--without-loadleveler' '--without-lsf' '--without-alps' '--with-slurm' '--without-tm' '--without-sge' '--disable-memchecker' '--with-libevent=/home/gavaf/spack/opt/linux-rocky8-x86_64/gcc-11.3.0/libevent/2.1.12-rbkwyej3j3ubh65p3qb3fpk7feut767l' '--with-zlib=/home/gavaf/spack/opt/linux-rocky8-x86_64/gcc-11.3.0/zlib-ng/2.2.1-2juy7xtmbpubtb6al6q2h6grlpbltgxa' '--with-hwloc=/usr' '--disable-java' '--disable-mpi-java' '--disable-io-romio' '--with-gpfs=no' '--without-cuda' '--enable-wrapper-rpath' '--disable-wrapper-runpath' '--disable-debug'
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
08e41ed5629b51832f5708181af6d89218c7a74e 3rd-party/openpmix (v1.1.3-4067-g08e41ed5)
30cadc6746ebddd69ea42ca78b964398f782e4e3 3rd-party/prrte (psrvr-v2.0.0rc1-4839-g30cadc6746)
6032f68dd9636b48977f59e986acc01a746593a6 3rd-party/pympistandard (remotes/origin/main-23-g6032f68)
dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (dfff675)
Please describe the system on which you are running
- Operating system/version: rockylinux 8.8
- Computer hardware: AMD EPYC Processor
- Network type:
Details of the problem
The pprte commit to which the 3rdParty/pprte folder points to is bugged.
When running with --map-by method:mod1,mod2
only mod1
is taken into account.
There was a long discussion in this issue #12967 about this, which stemmed froma different issue.
As @rhc54 pointed out, this has been fixed in prrte v3 (I've tested 3.0.11 myself and the issue is indeed solved there).
However from the output of the git submodule status
it looks like 3rdParty points to psrvr-v2.0.0rc1-4839-g30cadc6746
which still has the bug.
Here an example.
mpirun -np 4 --verbose --bind-to hwt --map-by pe-list=0,3,128,131:ordered:hwtcpus --report-bindings --mca mca_base_verbose debug --mca rmaps_base_verbose 100 hostname
...
rmaps:base set policy with pe-list=0,3,128,131:ordered:hwtcpus
rmaps:base policy pe-list=0,3,128,131 modifiers ordered provided
rmaps:base check modifiers with ordered
mca:rmaps: mapping job prterun-mrl-pldevbld88-3796956@1
mca:rmaps: setting mapping policies for job prterun-mrl-pldevbld88-3796956@1 inherit TRUE hwtcpus FALSE
which shows that hwtcpus
in this case has not been taken into account.
I think the solution would be to update the commit reference in 3rdParty/prrte to a prrte v3 commit (prrte-v4 requires also an update of pmix to v6)