In the context of NCCL_PROTO=LL128, which requires atomicity guarantees for received data within 128-byte boundaries, I've identified a potential issue in the current implementation.
In the function efa_rdm_pke_copy_payload_to_ope(), when handling non-CUDA and non-HMEM memory types (specifically system memory), the code uses ofi_copy_to_iov for data copying:
bytes_copied = ofi_copy_to_iov(ope->iov, ope->iov_count, segment_offset + ep->msg_prefix_size, pke->payload, pke->payload_size);
The ofi_copy_to_iov function ultimately calls memcpy, which may not guarantee atomicity for 128-byte data transfers. This could potentially violate the atomicity requirements of the LL128 protocol.
Questions:
- Is this a genuine concern for LL128 protocol compliance?
- Should system memory copies also have atomicity guarantees for 128-byte boundaries?
- If this is indeed an issue, what would be the recommended approach to ensure atomicity for system memory copies in LL128 scenarios?
Thank you for your attention to this issue.