Utility providers #10346

piotrchmiel · 2024-08-28T14:53:33Z

piotrchmiel
Aug 28, 2024

Is there a description somewhere of how utility rxm - rxd providers work? The description https://ofiwg.github.io/libfabric/v1.22.0/man/fi_rxm.7.html in the man page is too laconic for me.

Consider the Verbs provider and the utility RXM provider. How does RXM work with the Verbs provider under the hood? How are QPs (Queue Pairs) handled underneath? Is there a general description of the underlying mechanism? I would like to understand how it works under the hood to get an intuition of its impact on performance.
ibv_devices shows me the device mlx5_0. From the libfabric perspective, I can use mlx5_0 with the RXM utility provider in RDM mode and mlx5_0_dgram with the RXD utility provider. What are the specific differences in behavior between these two cases? Should I use mlx5_0 or mlx5_0_dgram?

aingerson · 2024-09-03T17:49:44Z

aingerson
Sep 3, 2024
Collaborator

I gave a presentation (video) about 5 years ago with a general overview of the utility providers (rxm, rxd, shm). It is a bit out of date and doesn't go into hardware-specific details like you're talking like queue pairs, mlx, etc, but it might help a little?

The big benefit of using mlx5_0_dgram with rxd would come into effect for large scale workloads. So the idea would be that you would use mlx5_0 with rxm until a certain size, until the resources get strained, and then switch to mlx5_dgram with rxd. Right now, that ability to switch internally doesn't exist in OFI but we're looking at adding that as part of the peer provider enhancements that we're currently using to target intranode offload for shm but it could be expanded to handle rxm+rxd+shm integrated all together. As it is right now, however, there isn't a real use implementation for it and rxd in reality is not optimized (or maintained) to be able to be used performantly. I would recommend just sticking with mlx5_0 + rxm as we have tested that for fairly large scale jobs without hitting the limitation, but stay tuned for rxd improvements and offload!

0 replies

piotrchmiel · 2025-04-08T09:00:37Z

piotrchmiel
Apr 8, 2025
Author

@aingerson Thanks so much for sharing the presentation – even if it's a few years old, there were a lot of interesting insights that would actually be great to capture on the RXM and RDM pages in the documentation. It would definitely help clarify a lot of concepts for people getting into the details of the different utility providers.

I was especially interested in your point about the benefit of switching to mlx5_0_dgram with rxd for large-scale workloads. Just to get a better idea – what would you consider “large scale” in this context? Are we talking hundreds of nodes? Thousands? Just trying to understand where that threshold might lie in practice.

Also, you mentioned that rxd isn’t currently optimized or maintained to perform well – are there any concrete plans to pick that work back up, or is it more of a “maybe someday” kind of thing?

Thanks again for all the info – really helpful.

4 replies

aingerson Apr 8, 2025
Collaborator

We've validated the rxm model to be working pretty well up to around 32k ranks (so like a thousand nodes, 32 ranks each). It depends on your workload of course - if you fully connect 32k ranks, you're obviously going to run into issues. But somewhere in the vague ballpark of hundreds to thousands :)

There are no concrete plans to pick it back up. Not quite a "maybe someday" - more like a "definitely someday whenever we have the bandwidth and resources"

I'll make a note to add more information to the man pages. Thanks for the suggestion!

piotrchmiel Apr 9, 2025
Author

What is the current maintenance status of the RxM provider? Is it actively maintained, and are there any new features planned for its development?

aingerson Apr 10, 2025
Collaborator

RxM is actively maintained. One of the big things we are working on for it is integrated shared memory offload for tcp and verbs through RxM. That should be coming in the next release (June)

piotrchmiel Apr 17, 2025
Author

@aingerson Thank you for all your answers :-) One more thing. Do you happen to have any publicly available performance results for the RXM provider over verbs?

piotrchmiel · 2025-04-08T09:01:48Z

piotrchmiel
Apr 8, 2025
Author

One more thing I wanted to ask about – there's a note in the RxM documentation:

When sending large messages, an app doing an sread or waiting on the CQ file descriptor may not get a completion when reading the CQ after being woken up from the wait. The app has to do sread or wait on the file descriptor again. This is needed because RxM uses a rendezvous protocol for large message sends. An app would get woken up from waiting on CQ fd when rendezvous protocol request completes but it would have to wait again to get an ACK from the receiver indicating completion of large message transfer by remote RMA read.

Do you happen to have any example or test case where this behavior can actually be observed in practice? I'm curious how to detect when this situation occurs and how to distinguish whether a CQ entry corresponds to a rendezvous protocol step versus the actual application-level message completion.

2 replies

aingerson Apr 8, 2025
Collaborator

Hmm I'm not entirely sure. I think the code should handle this and retry for the application so I'm not sure what this is referring to. Maybe it's outdated and was written before the code was updated to handle it? I can take a look, though. Is this related to #10945 at all?

piotrchmiel Apr 9, 2025
Author

I don't think it's related to #10945, as I don't see any redundant confirmation in the CQ. While debugging the issue and searching for information, I came across this entry in the documentation and became curious about its meaning and how I should handle similar cases.

Utility providers #10346

Uh oh!

Uh oh!

piotrchmiel Aug 28, 2024

Replies: 3 comments · 6 replies

Uh oh!

aingerson Sep 3, 2024 Collaborator

Uh oh!

piotrchmiel Apr 8, 2025 Author

Uh oh!

aingerson Apr 8, 2025 Collaborator

Uh oh!

piotrchmiel Apr 9, 2025 Author

Uh oh!

aingerson Apr 10, 2025 Collaborator

Uh oh!

Uh oh!

piotrchmiel Apr 17, 2025 Author

Uh oh!

piotrchmiel Apr 8, 2025 Author

Uh oh!

aingerson Apr 8, 2025 Collaborator

Uh oh!

piotrchmiel Apr 9, 2025 Author

piotrchmiel
Aug 28, 2024

Replies: 3 comments 6 replies

aingerson
Sep 3, 2024
Collaborator

piotrchmiel
Apr 8, 2025
Author

aingerson Apr 8, 2025
Collaborator

piotrchmiel Apr 9, 2025
Author

aingerson Apr 10, 2025
Collaborator

piotrchmiel Apr 17, 2025
Author

piotrchmiel
Apr 8, 2025
Author

aingerson Apr 8, 2025
Collaborator

piotrchmiel Apr 9, 2025
Author