Skip to content

[FEA] Expose CUDA 13 async pools for managed and pinned memory #2054

@bdice

Description

@bdice

Is your feature request related to a problem? Please describe.
CUDA 13 added APIs like this:

​cudaError_t cudaMemGetDefaultMemPool(cudaMemPool_t* memPool, cudaMemLocation* location, cudaMemAllocationType type)

(docs) This API can be used to get a cudaMemPool_t that is backed by managed or pinned memory. This can be coupled with cudaMallocFromPoolAsync to have an async (stream-ordered) allocator for managed or pinned memory.

This would allow applications to stop using RMM's pool adaptor around a managed MR, which has lots of awkward behavior. In general, the driver pool should have better performance and better defragmentation than RMM's own pool adaptor implementation.

Describe the solution you'd like
There are several ways this API could look. I am showing the managed MR for now but a similar thing can be done for pinned.

  • (default choice) We could add a new async_managed_memory_resource
    • Provides a clear way to say "this is a new feature! use this!"
  • We could refactor managed_memory_resource to use the async allocator in CUDA 13
    • Provides some immediate benefit to existing users with no code change
    • This could be done in addition to a new async_managed_memory_resource class? No obvious downsides that I can see.

Describe alternatives you've considered

  • We could add a parameter to the async_memory_resource constructor, like type="managed"
    • This would probably be overlooked in favor of the managed_memory_resource, even if we document it extensively
  • We could encourage users to get their own cudaMemPool_t handle and use async_view_memory_resource
    • Awkward because it requires direct CUDA runtime interactions, and it's more code than most RMM users are used to.

Additional context
async_managed_memory_resource should raise an error on construction with CUDA 12, since it's not supported there.

For async_memory_resource today, we construct a special cudaMemPool_t with flags to enable the decompression engine. I think decompression engine isn't supported with managed memory, so the default managed memory pool should be enough here? This should be checked.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    To-do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions