perftest should support benchmarking of these new kinds of memory.
There are two basic variants of CUDA Unified Memory:
- managed memory, as allocated via cudaMallocManaged()
- system allocated memory, as allocated via malloc()
On IBM machines based POWER9, where the GPU is attached to the CPU via NVLINK, e.g. AC922 servers, the CUDA runtime supports GPUDirect RDMA on both variants.
For that to work, ODP must be enabled.
On other systems, like x86_64, support is still missing.