Skip to content

Commit

Permalink
ch4/ofi: Convert CUDA device id to handle for fi_mr_regattr
Browse files Browse the repository at this point in the history
Libfabric docs say that the value of the cuda field in the regattr
struct is the device handle gotten from cuDeviceGet, not the
ordinal. Fixes #7148.
  • Loading branch information
raffenet committed Oct 2, 2024
1 parent 12834e5 commit f8c92b8
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions src/mpid/ch4/netmod/ofi/ofi_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -707,8 +707,16 @@ MPL_STATIC_INLINE_PREFIX int MPIDI_OFI_register_memory(char *send_buf, size_t da
mr_attr.context = NULL;
#ifdef MPL_HAVE_CUDA
mr_attr.iface = (attr->type != MPL_GPU_POINTER_DEV) ? FI_HMEM_SYSTEM : FI_HMEM_CUDA;
mr_attr.device.cuda =
(attr->type != MPL_GPU_POINTER_DEV) ? 0 : MPL_gpu_get_dev_id_from_attr(attr);
if (attr->type == MPL_GPU_POINTER_DEV) {
MPL_gpu_device_handle_t dev_h;
int dev_id;

MPL_gpu_get_dev_id_from_attr(attr);
MPL_gpu_device_id_to_handle(&dev_h, dev_id);
mr_attr.device.cuda = dev_h;
} else {
mr_attr.device.cuda = 0;
}
#elif defined MPL_HAVE_ZE
/* OFI does not support tiles yet, need to pass the root device. */
mr_attr.iface = (attr->type != MPL_GPU_POINTER_DEV) ? FI_HMEM_SYSTEM : FI_HMEM_ZE;
Expand Down

0 comments on commit f8c92b8

Please sign in to comment.