-
Notifications
You must be signed in to change notification settings - Fork 456
Common OFI Mistakes to Avoid
Jianxin Xiong edited this page Sep 17, 2025
·
4 revisions
This document is meant to document or highlight commonly misunderstood aspects of the OFI documentation in order to help application developers avoid potential problems. This should not be taken as a substitute for a thorough reading of the manpages.
- All endpoints that issue asynchronous operations must be bound to a relevant CQ, even if they don't report completions.
- This is for error reporting purposes. Take for example an endpoint configured with
FI_SELECTIVE_COMPLETIONand bound to a counter, which only reads from the counter. If the counter read returns an error, the application may then read a more detailed error entry from the completion queue. - The endpoint needs only to be bound to a CQ for the operation types it will initiate. For example, if an
endpoint will only issue receive operations, it only needs to be bound to a CQ using the
FI_RECVflag.
- This is for error reporting purposes. Take for example an endpoint configured with
- When the
FI_CONTEXT(orFI_CONTEXT2) mode bit is specified, the application must pass in a valid 'struct fi_context' (orstruct fi_context2) when initiating an operation, such as send/recv/rma write. The memory pointed to by the structure needs to remain valid during the entire duration of the operation, until a valid completion is received by the application by reading the completion queue. Additionally, no other operation may reuse thestruct fi_contextduring the time the original operation is outstanding. If the application erroneously reuses thestruct fi_context, or erroneously reuses the context, an undefined error might occur. In some cases the error might appear as stack corruption which can be hard to debug!- The exception to this rule is when
FI_SELECTIVE_COMPLETIONis enabled to suppress completion entries, and an operation is initiated withoutFI_COMPLETIONflag set, the context parameter is ignored.
- The exception to this rule is when