Skip to content

Commit 9a6a048

Browse files
authored
Merge branch 'openshmem-org:master' into master
2 parents a2d9daa + 1d6f40e commit 9a6a048

29 files changed

+684
-49
lines changed

.github/issue_template.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
name: Issue Template
3+
about: Template for OpenSHMEM Issues
4+
title: ''
5+
labels: ''
6+
assignees: ''
7+
8+
---
9+
10+
# Problem Statement
11+
12+
<!-- Describe the problem solved by this proposal. -->
13+
14+
# Proposed Changes
15+
16+
<!-- Describe the high level idea and proposed changes. -->
17+
18+
# Impact on Implementations
19+
20+
<!-- Describe changes that implementations will be required to make here. -->
21+
22+
# Impact on Users
23+
24+
<!-- Describe the changes that will impact users here. -->
25+
26+
# References and Pull Requests
27+
28+
<!-- References to other pull requests or issues, papers, websites, etc. Please keep this updated. -->

.github/pull_request_template.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Summary of changes
2+
3+
# Proposal Checklist
4+
- [ ] Link to issue(s)
5+
- [ ] Changelog entry
6+
- [ ] Reviewed for changes to front matter
7+
- [ ] Reviewed for changes to back matter

content/backmatter.tex

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -143,12 +143,6 @@ \chapter{Undefined Behavior in OpenSHMEM}\label{sec:undefined}
143143
immediately upon an \openshmem call into the uninitialized library.
144144
\tabularnewline
145145
\hline
146-
Multiple calls to initialization routines & In an \openshmem program where
147-
the initialization routines \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread}
148-
have already been called, any subsequent calls to these initialization routines
149-
result in undefined behavior.
150-
\tabularnewline
151-
\hline
152146
Specifying invalid \ac{PE} numbers & For \openshmem routines that accept a
153147
\ac{PE} number as an argument, if the \ac{PE} number is invalid for the
154148
team associated with the operation (either implicitly or explicitly), the
@@ -661,6 +655,11 @@ \section{Version 1.6}
661655
The following list describes the specific changes in \openshmem[1.6]:
662656
\begin{itemize}
663657
%
658+
\item Added support for initialization and finalization routines to be called
659+
multiple times, and added an initialization status query API
660+
\FUNC{shmem\_query\_initialized}.
661+
\ChangelogRef{subsec:shmem_init, subsec:shmem_finalize, subsec:shmem_query_initialized}%
662+
%
664663
\item Added interleaved block transfer APIs \FUNC{shmem\_ibget} and
665664
\FUNC{shmem\_ibput}.
666665
\ChangelogRef{subsec:shmem_ibget, subsec:shmem_ibput}%
@@ -687,19 +686,42 @@ \section{Version 1.6}
687686
operations for team-based reductions.
688687
\ChangelogRef{teamreducetypes}%
689688
%
689+
\item Added the session routines, \FUNC{shmem\_ctx\_session\_start} and
690+
\FUNC{shmem\_ctx\_session\_stop}, which allow users to pass hints to the
691+
\openshmem library to apply runtime optimizations.
692+
\ChangelogRef{subsec:sessions}%
690693
\item Added fine grained completion routine: \FUNC{shmem\_pe\_quiet}.
691694
\ChangelogRef{subsec:shmem_pe_quiet}%
692695
%
693696
\item Split the listings for the \FUNC{shmem\_\{malloc, free, realloc, align\}}
694697
functions from a single entry in \openshmem[1.5] into separate entries.
695698
\ChangelogRef{subsec:shmem_malloc, subsec:shmem_free, subsec:shmem_realloc,
696699
subsec:shmem_align}%
700+
%
701+
\item Clarified that the \FUNC{shmem\_\{malloc, free, realloc, align,
702+
malloc\_with\_hints, calloc\}} functions are collective operations on
703+
the world team.
704+
\ChangelogRef{subsec:shmem_malloc, subsec:shmem_free, subsec:shmem_realloc,
705+
subsec:shmem_align, subsec:shmmallochint, subsec:shmem_calloc}%
697706
\item Corrected the level argument's recommended value in API notes for
698707
\FUNC{shmem\_pcontrol} to indicate that the value should be greater than
699708
2 to enable profiling with profile library defined effects and
700709
additional arguments.
701710
\ChangelogRef{subsec:shmem_pcontrol}
702711
%
712+
\item Clarified that \FUNC{shmem\_team\_get\_config} returns the current
713+
configuration values, which may differ from the values assigned at the
714+
time of the team's creation.
715+
\ChangelogRef{subsec:shmem_team_get_config}
716+
%
717+
\item Clarified the behavior of \FUNC{shmem\_team\_get\_config} when the
718+
\VAR{config\_mask} is 0 and/or the \VAR{config} argument is a null pointer.
719+
\ChangelogRef{subsec:shmem_team_get_config}
720+
%
721+
\item Clarified the behavior of \FUNC{shmem\_team\_split\_strided} when the
722+
stride argument is 0 or negative.
723+
\ChangelogRef{subsec:shmem_team_split_strided}
724+
%
703725
\end{itemize}
704726

705727
\section{Version 1.5}

content/collective_intro.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
\end{enumerate}
2323

2424
Concurrent accesses to symmetric memory by an \openshmem collective
25-
routine and any other means of access---where at least one updates the
25+
routine and any other means of access---where at least one \ac{PE} updates the
2626
symmetric memory---results in undefined behavior.
2727
Since \acp{PE} can enter and exit collectives at different times,
2828
accessing such memory remotely may require additional synchronization.

content/execution_model.tex

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,15 @@
88

99
\ac{PE} execution is loosely coupled, relying on \openshmem operations to
1010
communicate and synchronize among executing \acp{PE}. The \openshmem phase in
11-
a program begins with a call to the initialization routine \FUNC{shmem\_init}
11+
a program begins with the first call to the initialization routine \FUNC{shmem\_init}
1212
or \FUNC{shmem\_init\_thread}, which must be performed before using any of the
1313
other \openshmem library routines.
14-
An \openshmem program concludes its use of the \openshmem library when all \acp{PE} call
14+
An \openshmem program concludes its use of the \openshmem library when all \acp{PE}
15+
make their final call to
1516
\FUNC{shmem\_finalize} or any \ac{PE} calls \FUNC{shmem\_global\_exit}.
16-
During a call to \FUNC{shmem\_finalize}, the \openshmem library must
17-
complete all pending communication and release all the resources associated to
18-
the library using an implicit collective synchronization across \acp{PE}.
19-
Calling any \openshmem routine before initialization or after
20-
\FUNC{shmem\_finalize} leads to undefined behavior. After finalization, a
21-
subsequent initialization call also leads to undefined behavior.
17+
During the last call to \FUNC{shmem\_finalize}, the \openshmem library synchronizes
18+
all \acp{PE}, completes all pending communication and releases all the resources
19+
associated to the library.
2220

2321
The \acp{PE} of the \openshmem program are identified by unique integers. The
2422
identifiers are integers assigned in a monotonically increasing manner from zero

content/memmgmt_intro.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
symmetric data objects in the symmetric heap.
44

55
The symmetric memory allocation routines differ from the private heap
6-
allocation routines in that they must be called by all \acp{PE} in a
6+
allocation routines in that they must be called by all \acp{PE} in
77
the world team. When specified, each of these routines includes at
88
least one call to a procedure that is semantically equivalent to
99
\FUNC{shmem\_barrier\_all}. This ensures that all \acp{PE}

content/sessions_intro.tex

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
\openshmem \emph{sessions} provide a mechanism for applications to inform the
2+
\openshmem library of an upcoming sequence of communication routines that
3+
exhibit suitable patterns for runtime optimizations.
4+
A session is associated with a specific \openshmem communication context
5+
(Section~\ref{sec:ctx}), and it indicates the beginning and ending of
6+
communication phases on that context.
7+
The \FUNC{shmem\_ctx\_session\_start} routine indicates the beginning of a session,
8+
and the \FUNC{shmem\_ctx\_session\_stop} routine indicates the end of a session.
9+
The \LibConstRef{SHMEM\_CTX\_SESSION\_*} options (Table~\ref{session_opts}) indicate
10+
which patterns of \openshmem RMA and AMO routines will occur within a session.
11+
These options serve only as \textit{hints} to the library; it is up to the
12+
implementation whether or not to apply any optimizations within a session.
13+
A session may be provided a configuration argument that specifies attributes
14+
associated with the session. This configuration argument is of type
15+
\CTYPE{shmem\_ctx\_session\_config\_t}, which is detailed further in
16+
Section~\ref{subsec:shmem_team_config_t}.
17+
18+
Usage of the \openshmem session APIs on a particular context must comply with
19+
the requirements of all options set on that context.
20+
Starting and stopping \openshmem sessions should not affect the completion or
21+
ordering semantics of any \openshmem routines in the program.
22+
For these reasons, multi-threaded \openshmem programs may require additional
23+
thread synchronization to ensure sessions hints are correctly applied to
24+
shareable contexts.
25+
Because sessions are associated with an \openshmem communication context,
26+
routines not performed on a communication context (like collective routines)
27+
are ineligible for session hints.
28+
29+
The \FUNC{shmem\_ctx\_session\_config\_t} object requires the \CONST{SIZE\_MAX}
30+
macro defined in \HEADER{stdint.h} by \Cstd[99]~\S7.18.3 and
31+
\Cstd[11]~\S7.20.3.

content/shmem_align.tex

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@
1717

1818

1919
\apidescription{
20-
The \FUNC{shmem\_align} routine allocates a block in the symmetric
20+
The \FUNC{shmem\_align} routine is a collective operation on the
21+
world team that allocates a block in the symmetric
2122
heap that has a byte alignment specified by the \VAR{alignment}
2223
argument. The value of \VAR{alignment} shall be a multiple of
2324
\CONST{sizeof(void *)} that is also a power of two; otherwise, the
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
\apisummary{
2+
A structure type representing communication session configuration arguments
3+
}
4+
5+
\begin{apidefinition}
6+
7+
\begin{Csynopsis}
8+
typedef struct {
9+
size_t total_ops;
10+
} shmem_ctx_session_config_t;
11+
\end{Csynopsis}
12+
13+
\begin{apiarguments}
14+
None.
15+
\end{apiarguments}
16+
17+
18+
\apidescription{
19+
A communication session configuration object is provided as an argument to
20+
the \FUNC{shmem\_ctx\_session\_start} routine.
21+
The \VAR{shmem\_ctx\_session\_config\_t} object contains optional parameters
22+
that are associated with the options of a communication session.
23+
These parameters serve only as \textit{hints} to the library; it is up to
24+
the implementation whether or not to use the parameter values within
25+
a session.
26+
27+
The \VAR{total\_ops} member indicates the expected maximum number of all
28+
calls to \openshmem RMA routines within the session (i.e., after a call to
29+
\FUNC{shmem\_ctx\_session\_start} and before a corresponding call to
30+
\FUNC{shmem\_ctx\_session\_stop}).
31+
If \VAR{total\_ops} differs from the \textit{actual} number of calls to
32+
\openshmem RMA routines within the session, then application performance
33+
might be suboptimal; however, the result of any data transfers,
34+
completions, or memory ordering operations are unaffected by the value of
35+
\FUNC{total\_ops}.
36+
37+
When passing a configuration structure to \FUNC{shmem\_ctx\_session\_start},
38+
the mask parameter specifies which fields the application requests to
39+
associate with the session.
40+
Any configuration parameter value that is not indicated in the mask will be
41+
ignored, and the default value will be used instead.
42+
Therefore, a program must set only the fields for which it does not want
43+
the default value.
44+
45+
A configuration mask is created through a bitwise OR operation of the
46+
following library constants.
47+
A configuration mask value of \CONST{0} indicates that the session
48+
should be started with the default values for all configuration
49+
parameters.
50+
51+
\widetablerow{\LibConstRef{SHMEM\_CTX\_SESSION\_TOTAL\_OPS}}{
52+
The value of the \VAR{total\_ops} member of the \VAR{config} structure is
53+
unmasked within the session and applied as a hint.
54+
}
55+
56+
The default values for configuration parameters are:
57+
58+
\widetablerow{\VAR{total\_ops} = \CONST{SIZE\_MAX}}{
59+
By default, the expected maximum number of calls to \openshmem RMA routines
60+
in the session is set to the maximum value of a \VAR{size\_t} variable,
61+
\VAR{SIZE\_MAX}. This default setting indicates that the \openshmem
62+
application chooses not to specify a value for \VAR{total\_ops}.
63+
}
64+
}
65+
66+
\apinotes{
67+
Users are discouraged from calling \FUNC{shmem\_fence},
68+
\FUNC{shmem\_ctx\_fence}, \FUNC{shmem\_quiet}, or \FUNC{shmem\_ctx\_quiet}
69+
routines within a session whenever possible, because the library must
70+
impose strict completions to comply with ordering semantics.
71+
However, hints provided by \FUNC{shmem\_ctx\_session\_config\_t} do not imply
72+
the occurence of any completion or memory ordering operations.
73+
The requirements on buffers provided to \openshmem routines that are
74+
\textit{in-use} (as described in Section
75+
\ref{subsec:invoking_openshmem_operations}) apply regardless of any
76+
\FUNC{shmem\_ctx\_session\_config\_t} hints.
77+
}
78+
79+
\end{apidefinition}

content/shmem_ctx_session_start.tex

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
\apisummary{
2+
Start a communication session.
3+
}
4+
5+
\begin{apidefinition}
6+
7+
\begin{Csynopsis}
8+
void @\FuncDecl{shmem\_ctx\_session\_start}@(shmem_ctx_t ctx, long options, const shmem_ctx_session_config_t *config, long config_mask);
9+
\end{Csynopsis}
10+
11+
\begin{apiarguments}
12+
\apiargument{IN}{ctx}{A context handle specifying the context associated
13+
with this session.}
14+
\apiargument{IN}{options}{The set of requested options from
15+
Table~\ref{session_opts} for this session. Multiple options may be
16+
requested by combining them with a bitwise OR operation; otherwise,
17+
\CONST{0} can be given if no options are requested.}
18+
\apiargument{IN}{config}{
19+
A pointer to the configuration parameters for the session.}
20+
\apiargument{IN}{config\_mask}{
21+
The bitwise mask representing the set of configuration parameters to use
22+
from \VAR{config}.}
23+
\end{apiarguments}
24+
25+
\apidescription{
26+
\FUNC{shmem\_ctx\_session\_start} is a non-collective routine that begins a
27+
session on communication context \VAR{ctx} with hints requested via
28+
\VAR{options}.
29+
Sessions on a communication context must be stopped with a call to
30+
\FUNC{shmem\_ctx\_session\_stop} on the same context.
31+
If a session is already started on a given context, another call to
32+
\FUNC{shmem\_ctx\_session\_start} on that same context combines new options
33+
via a bitwise OR operation. In such a case, unmasked member values in the
34+
\VAR{config} argument replace any existing configuration values that are
35+
already applied to the session.
36+
37+
If \VAR{ctx} compares equal to \LibConstRef{SHMEM\_CTX\_INVALID} then
38+
\FUNC{shmem\_ctx\_session\_start} performs no action and returns immediately.
39+
40+
No combination of \VAR{options} passed to \FUNC{shmem\_ctx\_session\_start}
41+
results in undefined behavior, but some combinations may be detrimental for
42+
performance; for example, when selecting an option that is not applicable
43+
to the session. It is the user's responsibility to determine which
44+
combination of \VAR{options} benefits the performance of the session.
45+
46+
The \VAR{config} argument specifies session configuration parameters,
47+
which are described in Section~\ref{subsec:shmem_ctx_session_config_t}.
48+
49+
The \VAR{config\_mask} argument is a bitwise mask representing the set of
50+
configuration parameters to use from \VAR{config}.
51+
A \VAR{config\_mask} value of \CONST{0} indicates that the session should
52+
be started with the default values for all configuration parameters.
53+
See Section~\ref{subsec:shmem_ctx_session_config_t} for field mask names and
54+
default configuration parameters.
55+
}
56+
57+
\apireturnvalues{
58+
None.
59+
}
60+
61+
\sessiontablebegin
62+
63+
\sessiontablerow{\LibConstRef{SHMEM\_CTX\_SESSION\_BATCH}}{
64+
A \textit{batch} is a series of calls to \openshmem routines that occur
65+
within a session on a communication context (i.e., after a call to
66+
\FUNC{shmem\_ctx\_session\_start} and before a corresponding call to
67+
\FUNC{shmem\_ctx\_session\_stop}), that might tolerate an increase in
68+
individual call latencies. Designating a batch may provide an opportunity
69+
to decrease the overall overhead typically involved with the \openshmem
70+
library implementing the series as individual RMA operations. In other
71+
words, the performance of \openshmem programs that issue many consecutive
72+
and small-sized RMA routines might be improved by informing the library
73+
implementation ahead of time that it is free to delay transferring data
74+
in order to buffer, combine, and/or coalesce the issued \openshmem
75+
routines. The specific mechanisms for improving performance using
76+
batching optimizations depend on the \openshmem library implementation.
77+
78+
The \VAR{SHMEM\_CTX\_SESSION\_BATCH} hint indicates that a communication
79+
context will be used to issue a batch. An example of a batch is an
80+
iterative loop of non-blocking RMA and/or AMO routines. A batch may
81+
include a memory ordering or collective operation, but such routines
82+
might require completions and/or synchronization that could degrade
83+
performance.
84+
85+
Because sessions do not affect the completion or ordering semantics of any
86+
\openshmem routines in the program, routines such as non-blocking RMAs,
87+
non-blocking AMOs, non-blocking \OPR{put-with-signals}, blocking scalar
88+
\OPR{puts}, small blocking \OPR{puts}, and blocking non-fetching AMOs are
89+
viable candidates for batching. Other routines, such as large blocking
90+
\OPR{puts}, all blocking \OPR{gets}, blocking fetching AMOs, and the
91+
memory ordering routines might require the library to enforce
92+
completions, reducing the potential benefit of batching.
93+
94+
The \VAR{total\_ops} field of \VAR{config} indicates the expected maximum
95+
number of calls to \openshmem RMA routines within the session.
96+
See Section~\ref{subsec:shmem_ctx_session_config_t} for details
97+
about \VAR{shmem\_ctx\_session\_config\_t} parameters.
98+
} \hline
99+
100+
\sessiontableend
101+
102+
\apinotes{
103+
The \FUNC{shmem\_ctx\_session\_start} routine provides hints for improving
104+
performance, and \openshmem implementations are not required to apply any
105+
optimization.
106+
\FUNC{shmem\_ctx\_session\_start} is non-collective, so there is no implied
107+
synchronization.
108+
Blocking puts must be sufficiently small to benefit from batching, and the
109+
exact threshold for this benefit depends on the \openshmem implemenation
110+
and/or the application.
111+
}
112+
113+
\end{apidefinition}

0 commit comments

Comments
 (0)