Skip to content

Conversation

@j-xiong
Copy link
Contributor

@j-xiong j-xiong commented Nov 25, 2025

The threading mode was set to FI_THREAD_SAFE, but the UCX threading mode was set to UCS_THREAD_MODE_SINGLE which only allowed a single thread to access UCX objects. Trying to access an EP (which hosts the UCX worker) from multiple threads may cause the following error:

ucp_ep.c:1808 Assertion 'ucs_async_check_owner_thread(&(worker)->async)' failed

Changes made:

(1) Allow threading mode be set based on application hints.

(2) Use proper UCX threading mode based on OFI threading mode.

(3) Add a runtime parameter to force single thread mode. This is for the
purpose of evaluating the performance impact of supporting multiple
threads since the default setting has changed.

Fix #11651

The threading mode was set to FI_THREAD_SAFE, but the UCX threading mode was
set to UCS_THREAD_MODE_SINGLE which only allowed a single thread to access
UCX objects. Trying to access an EP (which hosts the UCX worker) from multiple
threads may cause the following error:

`ucp_ep.c:1808 Assertion 'ucs_async_check_owner_thread(&(worker)->async)' failed`

Changes made:

(1) Allow threading mode be set based on application hints.

(2) Use proper UCX threading mode based on OFI threading mode.

(3) Add a runtime parameter to force single thread mode. This is for the
    purpose of evaluating the performance impact of supporting multiple
    threads since the default setting has changed.

Signed-off-by: Jianxin Xiong <[email protected]>
@j-xiong
Copy link
Contributor Author

j-xiong commented Nov 26, 2025

Intel CI failure is a known issue that is unrelated (tcp Bad file descriptor).

@j-xiong j-xiong merged commit 3a9362c into ofiwg:main Nov 26, 2025
19 of 20 checks passed
@j-xiong j-xiong deleted the fix-ucx-threading branch November 26, 2025 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

prov/ucx: Crashes when multiple threads do fabric calls even with FI_THREAD_SAFE

2 participants