[FEA] Expose CUDA 13 async pools for managed and pinned memory

**Is your feature request related to a problem? Please describe.**
CUDA 13 added APIs like this:

```
​cudaError_t cudaMemGetDefaultMemPool(cudaMemPool_t* memPool, cudaMemLocation* location, cudaMemAllocationType type)
``` 

([docs](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY__POOLS.html#group__CUDART__MEMORY__POOLS_1g9b7881aefbbbbc253f39f187b37639fe)) This API can be used to get a `cudaMemPool_t` that is backed by managed or pinned memory. This can be coupled with `cudaMallocFromPoolAsync` to have an async (stream-ordered) allocator for managed or pinned memory.

This would allow applications to stop using RMM's pool adaptor around a managed MR, which has lots of awkward behavior. In general, the driver pool should have better performance and better defragmentation than RMM's own pool adaptor implementation.

**Describe the solution you'd like**
There are several ways this API could look. I am showing the `managed` MR for now but a similar thing can be done for `pinned`.

- (default choice) We could add a new `async_managed_memory_resource`
  - Provides a clear way to say "this is a new feature! use this!"
- We could refactor `managed_memory_resource` to use the async allocator in CUDA 13
  - Provides some immediate benefit to existing users with no code change
  - This could be done in addition to a new `async_managed_memory_resource` class? No obvious downsides that I can see.

**Describe alternatives you've considered**
- We could add a parameter to the `async_memory_resource` constructor, like `type="managed"`
  - This would probably be overlooked in favor of the `managed_memory_resource`, even if we document it extensively
- We could encourage users to get their own `cudaMemPool_t` handle and use `async_view_memory_resource`
  - Awkward because it requires direct CUDA runtime interactions, and it's more code than most RMM users are used to.

**Additional context**
`async_managed_memory_resource` should raise an error on construction with CUDA 12, since it's not supported there.

For `async_memory_resource` today, we construct a special `cudaMemPool_t` with flags to enable the decompression engine. I think decompression engine isn't supported with managed memory, so the default managed memory pool should be enough here? This should be checked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Expose CUDA 13 async pools for managed and pinned memory #2054

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Expose CUDA 13 async pools for managed and pinned memory #2054

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions