Skip to content

ODR violation with libopen-palcommon_sm #13469

@devreal

Description

@devreal

I have built Open MPI with the address sanitizer enabled and get this error when launching the application:

=================================================================
==473823==ERROR: AddressSanitizer: odr-violation (0x7ffff156e4a0):
  [1] size=64 'mca_common_sm_module_t_class' ../../../../../opal/mca/common/sm/common_sm.c:43:1
  [2] size=64 'mca_common_sm_module_t_class' ../../../../../opal/mca/common/sm/common_sm.c:43:1
These globals were registered at these points:
  [1]:
    #0 0x7ffff7762b28 in __asan_register_globals ../../../../libsanitizer/asan/asan_globals.cpp:346
    #1 0x7ffff15603f4 in _sub_I_00099_1 (/gpfs/home/jschuchart/opt/ompi-big-datatypes/lib/openmpi/mca_btl_smcuda.so+0x203f4)
    #2 0x7ffff7fcc51d in call_init /usr/src/debug/glibc-2.34-168.el9_6.23.x86_64/elf/dl-init.c:70
    #3 0x7ffff7fcc51d in call_init /usr/src/debug/glibc-2.34-168.el9_6.23.x86_64/elf/dl-init.c:26

  [2]:
    #0 0x7ffff7762b28 in __asan_register_globals ../../../../libsanitizer/asan/asan_globals.cpp:346
    #1 0x7fffe1c1857f in _sub_I_00099_1 (/gpfs/home/jschuchart/opt/ompi-big-datatypes/lib/libopen-pal.so.0+0x25457f)
    #2 0x7ffff7fcc51d in call_init /usr/src/debug/glibc-2.34-168.el9_6.23.x86_64/elf/dl-init.c:70
    #3 0x7ffff7fcc51d in call_init /usr/src/debug/glibc-2.34-168.el9_6.23.x86_64/elf/dl-init.c:26

==473823==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'mca_common_sm_module_t_class' at ../../../../../opal/mca/common/sm/common_sm.c:43:1
==473823==ABORTING

It seems that libopen-palmca_common_sm_noinst.a (which contains mca_common_sm_module_t_class) is built statically and gets linked into both libopen-pal.so and mca_btl_smcuda.so, which leads to two instances of global variables with the same name being loaded.

In common/sm/Makefile.am I found this comment:

# Note that building this common component statically and linking
# against other dynamic components is *not* supported!

I think by building smcuda dynamically and linking common_sm statically we're violating that note. Maybe we should force common_sm to be built dynamically if mca_btl_smcuda.so is being built?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions