-
Notifications
You must be signed in to change notification settings - Fork 231
Description
The RMM async memory resource has a change in behavior in 25.12 to stop priming by default (#1931, #2051). I expect this to have no negative impact, from running PDS-H workflow benchmarks and new microbenchmarks. I would like to discourage the use of passing a non-default initial_pool_size
argument (the default is still std::nullopt
, but priming no longer occurs in the default case).
Looking at downstream uses of the async memory resource constructor, I see that applications are treating the async MR as if its pool size means the same thing as the RMM pool adaptor. It does not have the same meaning (and this should be clearer in the docs). The async MR's "pool size" is just a priming size (which we are now skipping by default and may remove in the future). The priming allocates and immediately deallocates that "pool" size. The pool adaptor's "pool size" is held and managed by RMM's pool for suballocation.
Examples where we should encourage a change:
Spark:
- https://github.com/rapidsai/cudf/blob/45def677104329adcaa245d1c210c90ba487c8af/java/src/main/java/ai/rapids/cudf/RmmCudaAsyncMemoryResource.java#L32
- Spark should pass through
std::nullopt
by default, rather than along
- Spark should pass through
- https://github.com/rapidsai/cudf/blob/45def677104329adcaa245d1c210c90ba487c8af/java/src/main/java/ai/rapids/cudf/RmmCudaAsyncMemoryResource.java#L54-L56
getSize
isn't properly defined here because the value is the pool priming size, not a ceiling on the pool size
- https://github.com/rapidsai/cudf/blob/45def677104329adcaa245d1c210c90ba487c8af/java/src/main/java/ai/rapids/cudf/Rmm.java#L187-L190
- As above, this pool doesn't have a fixed size
dask-cuda:
- https://github.com/rapidsai/dask-cuda/blob/bb92f2c1fd18b8349ca4f64217301942d3d0223e/dask_cuda/plugins.py#L93
- For the async MR, we may always want to pass
None
here. The initial pool size argument is also used for the pool adaptor, and that has a different meaning (as discussed above). - Deprecate rmm_pool_size and rmm_async dask-cuda#1563
- For the async MR, we may always want to pass
- https://github.com/rapidsai/dask-cuda/blob/bb92f2c1fd18b8349ca4f64217301942d3d0223e/dask_cuda/benchmarks/utils.py#L499
cudf.pandas:
- https://github.com/rapidsai/cudf/blob/45def677104329adcaa245d1c210c90ba487c8af/python/cudf/cudf/pandas/__init__.py#L81
- We should pass
None
, as above. - Disable async MR priming in cudf.pandas cudf#20133
- We should pass
velox:
Metadata
Metadata
Assignees
Labels
Type
Projects
Status