-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Possible data races detected with ThreadSanitizer in ger_thread, ... #5153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @vfdev-5, this is useful. I noticed that a number of the issues are present in |
@martin-frbg for context: in Python there's a significant effort happening to remove the GIL (Global Interpreter Lock), so a free-threading CPython interpreter now allows calling Python functions in parallel threads at the same time. Meaning |
Thanks - I'm looking into level 2 BLAS thread safety in conjunction with #5104 - most attention had been on level 3 and the thread server code in the past |
The tsan log does not make much sense to me. Can't say I'm much happier with helgrind though, as it flags a couple of atomic operations in blas_server.c and "possible races" between Fortran and C accessing the work array |
I'm not familiar with your code so my diagnosis may be wrong, but the tsan race looks right to me. The read here OpenBLAS/driver/others/blas_server.c Line 1056 in ef9e3f7
and the write here: OpenBLAS/driver/level2/ger_thread.c Line 180 in ef9e3f7
are racy because I don't think we've done anything to order the reader and the writer. The only synchronization is via the |
I have an impression that the race reported as
between: OpenBLAS/driver/level2/ger_thread.c Line 61 in ef9e3f7
and OpenBLAS/driver/level2/ger_thread.c Line 148 in ef9e3f7
looks real and in the level3 I saw a use of static pthread_mutex_t level3_lock for a similar situation:OpenBLAS/driver/level3/level3_gemm3m_thread.c Line 886 in ef9e3f7
I'm also not familiar with OpenBLAS code, so may be wrong as well. |
@vfdev-5 thanks. Unfortunately I'm struggling to get anything at all OpenBLAS-wise done right now, so please allow for some delays and premature comments. Also most of that is legacy GotoBLAS code that maybe got more trust than it deserves. @hawkinsp indeed I appear to have overlooked the ATOMIC_RELAXED qualifier - or trusted the submitter to know what he was doing - when I merged the relevant PR, which would seem to have (re)introduced a race wherever C11 semantics are available (kind of tracks with newer compiler versions exposing this, I guess). Changing to ATOMIC_ACQ_REL seems to have plugged the ARMV6 hole of #5104, not sure if there is a safe middle ground performance-wise. |
And indeed it would make perfect sense that such a problem only show up on ARM, which has weaker memory model than x86, say. |
I suspect it's enough to use |
I built TSAN instrumented OpenBLAS with clang-18, here is the build command:
and then I run utests with few threads:
ThreadSanitizer reports various data races:
https://gist.github.com/vfdev-5/7e57d305f100003aa4b9ada11efbcf0e#file-make_utest-nt-2-log
Environment info:
The text was updated successfully, but these errors were encountered: