Question on verbs;ofi_rxm rendezvous serialization due to duplex QP #11051
Replies: 1 comment
-
|
I opened an issue instead. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm running IMB-MPI[1] benchmarks using Open MPI[2] with the
verbs;ofi_rxmprovider in Libfabric. While analysing performance, I encountered a serialisation issue affecting certain collective operations, particularly with large messages.Here's the scenario I'm observing:
Consider two hosts, h1 and h2, each performing MPI_Send operations in parallel. From what I understand, for large messages,
ofi_rxmuses a rendezvous protocol. For example, on h1, the following sequence of verbs operations occurs:Because QPs are used as duplex channels, if the
rndv_ctrl_reqfrom h2 arrives at h1 while itsibv_write()is in progress, the response (rndv_ctrl_write) is delayed. This causes h2 to stall waiting for the signal to proceed with its ownibv_write(), serialising what should be parallel transfers.One potential solution I’m considering is using two QPs per connection, where each direction uses its own QP (i.e. treating QPs as simplex channels)
Is there a way to configure RXM (or Libfabric more generally) to use multiple QPs per connection in this manner?
Thanks,
Dragos
[1] https://github.com/intel/mpi-benchmarks
[2] https://github.com/open-mpi/ompi
Beta Was this translation helpful? Give feedback.
All reactions