[FEA] Add a peer-to-peer memory resource for multi-GPU nodes

**Is your feature request related to a problem? Please describe.**
During multi-GPU, single-node workflows, it is common for one GPU to hit OOM while others on the system are partially free or even completely idle.

As of 25.10, the best option is for the oversubscribed GPU to spill to host, either in the application layer or by using a CUDA managed memory resource.

In most cases, we expect it would be more efficient to make the allocations on a peer GPU on the same node, and access that data over NVLink. For example, this would mean that GPU0 would launch kernels accessing data on GPU1. 

**Describe the solution you'd like**
This is a big topic and the main need right now is scoping.

| Topic | Summary | Status |
|--|--|--|
| Confirm correctness for launching kernels on peer data | We haven't tested launching cuDF kernels on GPU0 while pointing to data on GPU1. We may need to provide more information to the kernel launch, and this could need big changes in cuDF or other RMM users.  | |
| Confirm performance for launching kernels on peer data | We haven't measured the performance impact of launching kernels on peer data. Maybe the data is able to move smoothly without additional work. However, it seems likely that we will need to call `cudaMemAdvise` or other hints to achieve good performance when peer data is accessed. | |
| Compose an MR that could be used with peer data | Most multi-GPU applications for cuDF are using a process-per-GPU model. The MR for each process could be composed of an MR for the primary GPU and child MRs for peer GPUs. The composed MR would go in RMM applications but not necessarily impact RMM code directly. | |
| Scope and add any necessary resource adaptors to RMM | If it turns out that new resource adaptors are necessary or preferred, they should be added to RMM. Perhaps we need a "fallback" resource adaptor (#2074) to trigger going from primary to peer GPUs and a "round robin" resource adaptor to select the peer. | |


**Describe alternatives you've considered**
Use spilling to host, either implicitly through CUDA managed memory or explicitly in the application layer.

**Additional context**
Clearly, peer-to-peer memory throughput will be lower than primary GPU memory throughput. However, many important kernels in data processing show low DRAM utilization (decompression, decoding, atomics-bound cuco count and retrieve). If it turns out that peer access results in similar kernel runtimes to primary access, then we might see benefits from treating the sum of GPU memory as a single pool.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Add a peer-to-peer memory resource for multi-GPU nodes #2075

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Topic	Summary	Status
Confirm correctness for launching kernels on peer data	We haven't tested launching cuDF kernels on GPU0 while pointing to data on GPU1. We may need to provide more information to the kernel launch, and this could need big changes in cuDF or other RMM users.
Confirm performance for launching kernels on peer data	We haven't measured the performance impact of launching kernels on peer data. Maybe the data is able to move smoothly without additional work. However, it seems likely that we will need to call `cudaMemAdvise` or other hints to achieve good performance when peer data is accessed.
Compose an MR that could be used with peer data	Most multi-GPU applications for cuDF are using a process-per-GPU model. The MR for each process could be composed of an MR for the primary GPU and child MRs for peer GPUs. The composed MR would go in RMM applications but not necessarily impact RMM code directly.
Scope and add any necessary resource adaptors to RMM	If it turns out that new resource adaptors are necessary or preferred, they should be added to RMM. Perhaps we need a "fallback" resource adaptor (#2074) to trigger going from primary to peer GPUs and a "round robin" resource adaptor to select the peer.

[FEA] Add a peer-to-peer memory resource for multi-GPU nodes #2075

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions