Very slow __pthread_mutex_unlock on writemsg on EFA #11191
-
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
|
Thanks for reaching out, though it worth looking at the slowness of mutex unlock, for single thread, can you specify |
Beta Was this translation helpful? Give feedback.
-
|
efa-direct acquire the ep lock https://github.com/ofiwg/libfabric/blob/main/prov/efa/src/efa_rma.c#L217 which is a mutex lock for |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the fast response! Setting BTW, do you know what's the hardware spec for small packet rate? At my current rate, it's roughly 152 Gbps (76% of 200 Gbps) in terms of bandwidth. I guess the hardware could get full bandwidth at MTU (8928 bytes)? |
Beta Was this translation helpful? Give feedback.
-
|
You can run |
Beta Was this translation helpful? Give feedback.
-
|
Like |
Beta Was this translation helpful? Give feedback.
-
|
Those commands are super helpful! Thank you! I made a small change (increase iters, specify EFA device and GPU device): Here's the result: Looks like perftest is slightly slower than my program. I guess maybe this is very close to hardware limit? I'll probably experiment with |
Beta Was this translation helpful? Give feedback.

Thanks for reaching out, though it worth looking at the slowness of mutex unlock, for single thread, can you specify
domain_attr.threading = FI_THREAD_DOMAINin your hints? I believe the mutex lock you refer to is https://github.com/ofiwg/libfabric/blob/main/prov/efa/src/rdm/efa_rdm_rma.c#L493. It will become a no-op for the domain thread level.