Skip to content

Commit c34ca97

Browse files
committed
use cases: add debugging chapter
1 parent 8220880 commit c34ca97

File tree

4 files changed

+243
-0
lines changed

4 files changed

+243
-0
lines changed

App_Use_Cases.tex

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,249 @@ \subsection{Use Case Details}
8585

8686
There are other keys that are helpful to have before a synchronization point, this is not meant to be a comprehensive list.
8787

88+
\section{Debugging}
89+
\label{app:uc-debugging}
90+
91+
This use case distills out the features/extensions requested in the RFCs that are related to debugging. We have identified parts of PR23 (Co-located process launch for debuggers), RFC0010 (MPIR-like query), RFC0002 (event pub/sub), and RFC0022 (Environmental Parameter Directives for Applications and Launchers) under this category.
92+
93+
\subsection{Terminology}
94+
95+
\subsubsection{Tools vs Debuggers}
96+
97+
A \texttt{tool} is a process designed to monitor, record, analyze, or control the execution of another process. Typically used for the purposes of profiling and debugging. A \texttt{first-party tool} runs within the address space of the application process while a \texttt{third-party tool} run within its own process. A \texttt{debugger} is a third-party tool that inspects and controls an application process's execution using system-level debug APIs (e.g., \code{ptrace}).
98+
99+
\subsubsection{Parallel Launching Methods}
100+
A \texttt{starter} program is a program responsible for launching a parallel runtime, such as \ac{MPI}. \ac{PMIx} supports two primary methods for launching parallel applications under tools and debuggers: indirect and direct. In the indirect launching method, the tool is attached to the starter. In the direct launching method, the tool takes the place of the starter.
101+
\ac{PMIx} also supports attaching to already running programs via the \texttt{Process Acquisition} interfaces.
102+
103+
\subsubsection{Process Synchronization}
104+
Process Synchronization is the technique tools use to start the processes of a parallel application such that the tools can still attach to the process early in it's lifetime. Said another away, the tool must be able to start the application processes without them ``running away'' from the tool. In the case of \ac{MPI}, this means stopping the applications processes before they return from \code{MPI_Init}.
105+
106+
\subsubsection{Process Acquisition}\label{subsubsec:process-acq}
107+
108+
Process Acquisition is technique tools use to locate all of the processes, local and remote, of a given parallel application. This typically boils down to collecting for every process in the parallel application: the hostname or IP of the machine running the process, the executable name, and the process ID.
109+
110+
\subsection{Use Case Details}
111+
\subsubsection{Direct-Launch Debugger Tool}
112+
113+
PMIx can support the tool itself using the PMIx spawn options to control the app’s startup, including directing the RM/application as to when to block and wait for tool attachment, or stipulating that an interceptor library be preloaded. However, this means that the user is restricted to whatever command line options the tool vendor has provided for operations such as process placement and binding, which places a significant burden on the tool vendor. An example might look like the following: \code{dbgr -n 3 ./myapp}.
114+
115+
Assuming it is supported, co-launch of debugger daemons in this use-case is supported by adding a \code{pmix_app_t} to the \refapi{PMIx_Spawn} command, indicating that the resulting processes are debugger daemons by setting the \refattr{PMIX_DEBUGGER_DAEMONS} attribute.
116+
117+
\begingroup
118+
\begin{figure*}
119+
\begin{center}
120+
\includegraphics[width=\textwidth,height=\textheight,keepaspectratio]{figs/direct-launch}
121+
\end{center}
122+
\caption{Direct Launch}
123+
\label{fig:direct_launch}
124+
\end{figure*}
125+
\endgroup
126+
127+
128+
\littleheader{Related Interfaces}
129+
130+
{\large \refapi{PMIx_tool_init}}
131+
\pasteSignature{PMIx_tool_init}
132+
133+
{\large \refapi{PMIx_Register_event_handler}}
134+
\pasteSignature{PMIx_Register_event_handler}
135+
136+
{\large \refapi{PMIx_Query_info}}
137+
\pasteSignature{PMIx_Query_info}
138+
139+
{\large \refapi{PMIx_Spawn}}
140+
\pasteSignature{PMIx_Spawn}
141+
142+
{\large \refapi{PMIx_Get}}
143+
\pasteSignature{PMIx_Get}
144+
145+
{\large \refapi{PMIx_Notify_event}}
146+
\pasteSignature{PMIx_Notify_event}
147+
148+
\littleheader{Related Attributes}
149+
150+
\pasteAttributeItem{PMIX_QUERY_SPAWN_SUPPORT}
151+
\pasteAttributeItem{PMIX_QUERY_DEBUG_SUPPORT}
152+
\pasteAttributeItem{PMIX_DEBUG_STOP_IN_INIT}
153+
\pasteAttributeItem{PMIX_FWD_STDOUT}
154+
\pasteAttributeItem{PMIX_FWD_STDERR}
155+
\pasteAttributeItem{PMIX_NOTIFY_COMPLETION}
156+
\pasteAttributeItem{PMIX_SETUP_APP_ENVARS}
157+
\pasteAttributeItem{PMIX_DEBUGGER_DAEMONS}
158+
\pasteAttributeItem{PMIX_DEBUG_JOB}
159+
\pasteAttributeItem{PMIX_QUERY_LOCAL_PROC_TABLE}
160+
161+
\littleheader{Related Constants}
162+
163+
\refconst{PMIX_DEBUG_WAITING_FOR_NOTIFY} \\
164+
\refconst{PMIX_DEBUGGER_RELEASE}
165+
166+
\subsubsection{Indirect-Launch Debugger Tool}
167+
168+
Executing a program under a tool using an intermediate launcher such as mpiexec can also be made possible. This requires some degree of coordination between the tool and the launcher. Ultimately, it is the launcher that is going to launch the application, and the tool must somehow inform it (and the application) that this is being done in a debug session so that the application knows to ``block'' until the tool attaches to it.
169+
170+
In this operational mode, the user invokes a tool (typically on a non-compute, or ``head'', node) that in turn uses mpiexec to launch their application – a typical command line might look like the following: \code{dbgr -dbgoption mpiexec -n 32 ./myapp}.
171+
172+
\begingroup
173+
\begin{figure*}
174+
\begin{center}
175+
\includegraphics[width=\textwidth,height=\textheight,keepaspectratio]{figs/indirect-launch}
176+
\end{center}
177+
\caption{Indirect Launch}
178+
\label{fig:indirect_launch}
179+
\end{figure*}
180+
\endgroup
181+
182+
183+
\littleheader{Related Interfaces}
184+
185+
{\large \refapi{PMIx_tool_init}}
186+
\pasteSignature{PMIx_tool_init}
187+
188+
{\large \refapi{PMIx_Register_event_handler}}
189+
\pasteSignature{PMIx_Register_event_handler}
190+
191+
{\large \refapi{PMIx_Spawn}}
192+
\pasteSignature{PMIx_Spawn}
193+
194+
{\large \refapi{PMIx_Notify_event}}
195+
\pasteSignature{PMIx_Notify_event}
196+
197+
{\large \refapi{PMIx_tool_attach_to_server}}
198+
\pasteSignature{PMIx_tool_attach_to_server}
199+
200+
{\large \refapi{PMIx_Query_info}}
201+
\pasteSignature{PMIx_Query_info}
202+
203+
{\large \refapi{PMIx_Get}}
204+
\pasteSignature{PMIx_Get}
205+
206+
\littleheader{Related Attributes}
207+
208+
\pasteAttributeItem{PMIX_SPAWN_TOOL}
209+
\pasteAttributeItem{PMIX_FWD_STDOUT}
210+
\pasteAttributeItem{PMIX_FWD_STDERR}
211+
\pasteAttributeItem{PMIX_SETUP_APP_ENVARS}
212+
\pasteAttributeItem{PMIX_DEBUG_STOP_IN_INIT}
213+
\pasteAttributeItem{PMIX_QUERY_PROC_TABLE}
214+
\pasteAttributeItem{PMIX_DEBUGGER_DAEMONS}
215+
\pasteAttributeItem{PMIX_DEBUG_JOB}
216+
\pasteAttributeItem{PMIX_FWD_STDOUT}
217+
\pasteAttributeItem{PMIX_FWD_STDERR}
218+
\pasteAttributeItem{PMIX_NOTIFY_COMPLETION}
219+
\pasteAttributeItem{PMIX_SETUP_APP_ENVARS}
220+
\pasteAttributeItem{PMIX_DEBUG_JOB}
221+
\pasteAttributeItem{PMIX_QUERY_LOCAL_PROC_TABLE}
222+
223+
\littleheader{Related Constants}
224+
225+
\refconst{PMIX_LAUNCHER_READY} \\
226+
\refconst{PMIX_LAUNCH_DIRECTIVE} \\
227+
\refconst{PMIX_LAUNCH_COMPLETE} \\
228+
\refconst{PMIX_DEBUG_WAITING_FOR_NOTIFY} \\
229+
\refconst{PMIX_DEBUGGER_RELEASE}
230+
231+
\subsubsection{Attaching to a Running Job}
232+
233+
PMIx supports attaching to an already running parallel job in two ways. In the first way, the main process of a tool calls \refapi{PMIx_Query_info} with the \refattr{PMIX_QUERY_PROC_TABLE} attribute. This returns an array of structs containing the information required for \hyperref[subsubsec:process-acq]{process acquisition}. This includes remote hostnames, executable names, and process IDs. In the second way, every tool daemon calls \refapi{PMIx_Query_info} with the \refattr{PMIX_QUERY_LOCAL_PROC_TABLE} attribute. This returns a similar array of structs but only for processes on the same node.
234+
235+
An example of this use-case may look like the following: \code{mpiexec -n32~./myApp \&\& dbgr attach \$!}.
236+
237+
\begingroup
238+
\begin{figure*}
239+
\begin{center}
240+
\includegraphics[width=\textwidth,height=\textheight,keepaspectratio]{figs/process-acquisition}
241+
\end{center}
242+
\caption{Attaching to a Running Job}
243+
\label{fig:proc_acq}
244+
\end{figure*}
245+
\endgroup
246+
247+
{\large \refapi{PMIx_tool_init}}
248+
\pasteSignature{PMIx_tool_init}
249+
250+
{\large \refapi{PMIx_Register_event_handler}}
251+
\pasteSignature{PMIx_Register_event_handler}
252+
253+
{\large \refapi{PMIx_Query_info}}
254+
\pasteSignature{PMIx_Query_info}
255+
256+
{\large \refapi{PMIx_Spawn}}
257+
\pasteSignature{PMIx_Spawn}
258+
259+
\pasteAttributeItem{PMIX_QUERY_PROC_TABLE}
260+
\pasteAttributeItem{PMIX_DEBUGGER_DAEMONS}
261+
\pasteAttributeItem{PMIX_DEBUG_JOB}
262+
\pasteAttributeItem{PMIX_FWD_STDOUT}
263+
\pasteAttributeItem{PMIX_FWD_STDERR}
264+
\pasteAttributeItem{PMIX_NOTIFY_COMPLETION}
265+
\pasteAttributeItem{PMIX_SETUP_APP_ENVARS}
266+
267+
\pasteAttributeItem{PMIX_QUERY_ALL_NAMESPACES}
268+
269+
\subsubsection{Tool Interaction with RM}
270+
271+
Tools can benefit from a mechanism by which they may interact with a local PMIx server that has opted to accept such connections along with support for tool connections to system-level PMIx servers, and a logging feature. To add support for tool connections to a specified system-level, PMIx server environments could choose to launch a set of PMIx servers to support a given allocation - these servers will (if so instructed) provide a tool rendezvous point that is tagged with their pid and typically placed in an allocation-specific temporary directory to allow for possible multi-tenancy scenarios. Supporting such operations requires that a system-level PMIx connection be provided which is not associated with a specific user or allocation. A new key has been added to direct the PMIx server to expose a rendezvous point specifically for this purpose.
272+
273+
{\large \refapi{PMIx_Query_info_nb}}
274+
\pasteSignature{PMIx_Query_info_nb}
275+
276+
{\large \refapi{PMIx_Register_event_handler}}
277+
\pasteSignature{PMIx_Register_event_handler}
278+
279+
{\large \refapi{PMIx_Deregister_event_handler}}
280+
\pasteSignature{PMIx_Deregister_event_handler}
281+
282+
{\large \refapi{PMIx_Notify_event}}
283+
\pasteSignature{PMIx_Notify_event}
284+
285+
{\large \refapi{PMIx_server_init}}
286+
\pasteSignature{PMIx_server_init}
287+
288+
\littleheader{Job-specific events}
289+
\code{PMIX_EVENT_JOB_LEVEL /* debugger attached, process failure */}
290+
291+
\littleheader{Environment events}
292+
\code{PMIX_EVENT_ENVIRO_LEVEL /*ECC errors, temperature excursions */}
293+
294+
\littleheader{Errors detected by clients/peers}
295+
\code{Network fabric manager detects data corruption}
296+
297+
\subsubsection{Environmental Parameter Directives for Applications and Launchers}
298+
299+
It is sometimes desirable or required that standard environmental variables (e.g., \code{PATH}, \code{LD_LIBRARY_PATH}, \code{LD_PRELOAD}) be modified prior to executing an application binary or a starter such as mpiexec - this is particularly true when tools/debuggers are used to start the application. This RFC proposes the definition of a new PMIx structure (\refstruct{pmix_envar_t}) and associated attributes for specifying such operations.
300+
301+
\littleheader{Related Interfaces}
302+
303+
{\large \refapi{PMIx_Spawn}}
304+
\pasteSignature{PMIx_Spawn}
305+
306+
\littleheader{Related Structs}
307+
308+
\refstruct{pmix_envar_t}
309+
310+
\littleheader{Related Attributes}
311+
312+
\pasteAttributeItem{PMIX_SET_ENVAR}
313+
\pasteAttributeItem{PMIX_ADD_ENVAR}
314+
\pasteAttributeItem{PMIX_UNSET_ENVAR}
315+
\pasteAttributeItem{PMIX_PREPEND_ENVAR}
316+
\pasteAttributeItem{PMIX_APPEND_ENVAR}
317+
318+
Resource managers and launchers must scan for relevant directives, modifying environmental parameters as directed. Directives are to be processed in the order in which they were given, starting with job-level directives (applied to each app) followed by app-level directives.
319+
320+
\littleheader{References}
321+
% TODO: convert these to bibtex references
322+
% 1. https://github.com/pmix/RFCs/pull/23
323+
% 2. https://github.com/pmix/RFCs/blob/master/RFC0010.md
324+
% 3. https://github.com/pmix/RFCs/blob/master/RFC0002.md
325+
% 4. https://github.com/pmix/RFCs/blob/master/RFC0022.md
326+
% 5. https://pmix.org/support/how-to/example-indirect-launch-debugger-tool/
327+
% 6. https://pmix.org/support/how-to/example-direct-launch-debugger-tool/
328+
% 7. https://github.com/openpmix/openpmix/blob/6a8cc1ca0523b531b20a9a0f7bf7b27c9b5c6023/examples/debugger.c
329+
330+
88331
\section{Hybrid Programming Models}
89332
\label{app:uc-hybrid-programming-models}
90333

figs/direct-launch.jpg

272 KB
Loading

figs/indirect-launch.jpg

355 KB
Loading

figs/process-acquisition.jpg

172 KB
Loading

0 commit comments

Comments
 (0)