Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opal_lifo test fails on FreeBSD amd64 #13134

Open
LaurentChardon opened this issue Mar 11, 2025 · 6 comments
Open

opal_lifo test fails on FreeBSD amd64 #13134

LaurentChardon opened this issue Mar 11, 2025 · 6 comments

Comments

@LaurentChardon
Copy link

OMPI 5.0.7 tests fail at the opal_lifo test on amd64 platforms running FreeBSD. This is true for all currently supported versions of FreeBSD, for all version 5 of OMPI that I have tested. I haven't tried version 4 but I can if it's useful.

For FreeBSD 14.2 on aarch64 with clang 18.1.6, all tests pass except for a few that are skipped. opal_lifo is not skipped, it passes.

For FreeBSD 14.2 on amd64 with clang 18.1.6, opal_lifo fails:

❯ cat work/openmpi-5.0.7/test/class/test-suite.log
===============================================
   Open MPI 5.0.7: test/class/test-suite.log
===============================================

# TOTAL: 10
# PASS:  9
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: opal_lifo
===============

 Failure :  lifo push/pop multi-threaded with atomics
 Failure :  list pop all items
SUPPORT: OMPI Test failed: opal_lifo_t (2 of 7 failed)
Single thread test. Time: 0 s 2883 us 2 nsec/poppush
Atomics thread finished. Time: 0 s 27179 us 27 nsec/poppush
Atomics thread finished. Time: 0 s 9381 us 9 nsec/poppush
Atomics thread finished. Time: 0 s 9839 us 9 nsec/poppush
Atomics thread finished. Time: 0 s 10220 us 10 nsec/poppush
Atomics thread finished. Time: 0 s 10721 us 10 nsec/poppush
Atomics thread finished. Time: 0 s 10926 us 10 nsec/poppush
Atomics thread finished. Time: 0 s 18153 us 18 nsec/poppush
Atomics thread finished. Time: 0 s 21205 us 21 nsec/poppush
Atomics thread finished. Time: 0 s 22446 us 22 nsec/poppush
All threads finished. Thread count: 8 Time: 0 s 22504 us 2 nsec/poppush
FAIL opal_lifo (exit status: 1)

The issue is not unique to this version of the compiler. I have the same failure with FreeBSD 15.0 on amd64 and clang 19.1.5, for example.

This issue may be related to #10988

@bosilca
Copy link
Member

bosilca commented Mar 11, 2025

According to Godbolt clang 18-20 does not support atomic operations on 16 bytes on x86_64 without the -mcx16 flag. However, with the proper flag the generated code is very similar to gcc code, which works (based on the fact that there are no pending issues on a major platform).

We need to confirm what OMPI configure script detected, and what version of the 16 bytes atomic operations it selects. This info is in config.log.

@LaurentChardon
Copy link
Author

@bosilca you nailed it. Adding the -mcx16 flag to CFLAGS fixed the issue. Thank you very much!

@jsquyres
Copy link
Member

@bosilca Good catch. Do we need to add a test into configure?

@jsquyres jsquyres added this to the v5.0.8 milestone Mar 11, 2025
@devreal
Copy link
Contributor

devreal commented Mar 11, 2025

Does that mean the non-16B lifo is broken?

@bosilca
Copy link
Member

bosilca commented Mar 11, 2025

That's kind of good, we have a solution. But it's also bad because 1) we already have that test but apparently not picking the pieces correctly, 2) the non-16B part of the code seems broken and 3) hell broke loose as we have a broken piece of code for years.

@edgargabriel
Copy link
Member

edgargabriel commented Mar 11, 2025

This is the potentially related issue: #12979 that I mentioned on the call

freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Mar 12, 2025
Clang does not support 16 byte atomic operations without -mcx16 on amd64
Upstream issue: open-mpi/ompi#13134

PR:	285341
MFH:	2025Q1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants