You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we have three levels of I/O consistency:
Sequential consistency among all accesses using a single file handle
Sequential consistency among all accesses using file handles created from a single collective open with atomic mode enabled
User-imposed consistency among accesses other than the above
For conflicting accesses, the default MPI semantics do not guarantee sequential consistency (i.e., POSIX consistency).
If two accesses conflict, sequential consistency can be guaranteed by:
Enabling atomic mode via the MPI_FILE_SET_ATOMICITY routine
Using ”sync-barrier-sync”. This guarantees that conflicting accesses are not concurrent.
Problem
Definition of MPI_FILE_SYNC:
Calling MPI_FILE_SYNC with fh causes all previous writes to fh by the calling process to be transferred to the storage device. If other processes have made updates to the storage device, then all such updates become visible to subsequent reads of fh by the calling process. MPI_FILE_SYNC may be necessary to ensure sequential consistency in certain cases.
MPI_FILE_SYNC is a collective operation.
MPI_FILE_SYNC is not guaranteed to be a temporally synchronizing function. (this is the most common way)
Issues with relying solely on MPI_FILE_SYNC for consistency:
Consistency and persistency: MPI_FILE_SYNC enforces both consistency (visibility of writes to other processes) and persistency (transfer of data to durable storage) in a single call. However, in many cases, only consistency is required—persistency adds unnecessary overhead.
Avoidance by high-level I/O libraries:
To avoid the performance penalty of disk flushes, high-level I/O libraries may skip MPI_FILE_SYNC. They often rely on MPI_Barrier for synchronization, which works on POSIX-compliant file systems (e.g., Lustre, GPFS), but can lead to data corruption on file systems with weaker consistency models (We observed this during the development of our non-POSIX file system).
Unnecessary collective synchronization: MPI_FILE_SYNC requires synchronization among all processes that collectively opened the file, regardless of whether they participate in the current I/O phase. This all-to-all nature prevents optimizations that distinguish between "producers" and "consumers."
Proposal
Introduce two new non-collective, consistency-only routines to synchronize concurrent access to shared files:
These functions enable sequential consistency without requiring a flush to the storage device. They are intended for synchronizing concurrent read/write access among processes and are not collective.
Add MPI_File_flush and MPI_File_refresh to Chapter 14.6: Consistency and Semantics.
Update the discussion in Section 14.6.1 to reflect new consistency mechanisms.
Add a new example in Section 14.9 demonstrating the flush-barrier-refresh pattern.
Impact on Implementations
For POSIX-based file systems (e.g., GPFS, Lustre, UFS), these routines may be implemented as no-ops since the system already ensures sequential consistency.
A prototype implementation has been developed and tested in MPICH (ROMIO):
18 file changes, approximately 500 lines of code added.
Impact on Users
No changes are required for existing applications using MPI_FILE_SYNC. However, for users (e.g., I/O library developers) who require sequential consistency but do not need data to be flushed to disk, the flush-barrier-refresh model provides a more efficient alternative.
One example scenario:
The text was updated successfully, but these errors were encountered:
Background
Currently, we have three levels of I/O consistency:
For conflicting accesses, the default MPI semantics do not guarantee sequential consistency (i.e., POSIX consistency).
If two accesses conflict, sequential consistency can be guaranteed by:
Problem
Definition of MPI_FILE_SYNC:
Issues with relying solely on
MPI_FILE_SYNC
for consistency:MPI_FILE_SYNC
enforces both consistency (visibility of writes to other processes) and persistency (transfer of data to durable storage) in a single call. However, in many cases, only consistency is required—persistency adds unnecessary overhead.To avoid the performance penalty of disk flushes, high-level I/O libraries may skip
MPI_FILE_SYNC
. They often rely onMPI_Barrier
for synchronization, which works on POSIX-compliant file systems (e.g., Lustre, GPFS), but can lead to data corruption on file systems with weaker consistency models (We observed this during the development of our non-POSIX file system).MPI_FILE_SYNC
requires synchronization among all processes that collectively opened the file, regardless of whether they participate in the current I/O phase. This all-to-all nature prevents optimizations that distinguish between "producers" and "consumers."Proposal
Introduce two new non-collective, consistency-only routines to synchronize concurrent access to shared files:
These functions enable sequential consistency without requiring a flush to the storage device. They are intended for synchronizing concurrent read/write access among processes and are not collective.
Example usage:
Changes to the Standard Text
MPI_File_flush
andMPI_File_refresh
to Chapter 14.6: Consistency and Semantics.flush-barrier-refresh
pattern.Impact on Implementations
For POSIX-based file systems (e.g., GPFS, Lustre, UFS), these routines may be implemented as no-ops since the system already ensures sequential consistency.
A prototype implementation has been developed and tested in MPICH (ROMIO):
Impact on Users
No changes are required for existing applications using
MPI_FILE_SYNC
. However, for users (e.g., I/O library developers) who require sequential consistency but do not need data to be flushed to disk, theflush-barrier-refresh
model provides a more efficient alternative.One example scenario:
The text was updated successfully, but these errors were encountered: