Skip to content

Commit 54b4833

Browse files
committed
Provide access to resource usage for processes and nodes
Define two attributes by which applications and/or tools can request resource usage of processes and nodes. Define a structure for each case to contain the information, and associated functions for constructing and destructing those structures. Signed-off-by: Ralph Castain <[email protected]>
1 parent 46c63a5 commit 54b4833

File tree

4 files changed

+693
-14
lines changed

4 files changed

+693
-14
lines changed

Chap_API_Job_Mgmt.tex

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -605,7 +605,6 @@ \subsection{Job control attributes}
605605
When recursively cleaning subdirectories, do not remove the top-level directory (the one given in the cleanup request).
606606
}
607607

608-
609608
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
610609
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
611610
\section{Process and Job Monitoring}

Chap_API_Query.tex

Lines changed: 31 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,18 @@ \chapter{Query Operations}
66

77
This chapter presents mechanisms for generalized queries that
88
access information about the host environment and the system in general.
9-
The chapter presents the concept of a query followed by a detailed explanation
9+
The chapter presents the concept of a query followed by a detailed explanation
1010
of the query \acp{API} provided. The chapter compares the use of these \acp{API} with \refapi{PMIx_Get}. The chapter concludes with detailed information about how to use
1111
the query interface to access information about what \ac{PMIx} \acp{API} an implementation supports as well as what attributes each supported \ac{API} supports.
1212

1313
\section{PMIx_Query_info}
1414
As the level of interaction between applications and the host \ac{SMS} grows, so too does the need for the application to query the \ac{SMS} regarding its capabilities and state information. \ac{PMIx} provides a generalized query interface for this purpose, along with a set of standardized attribute keys to support a range of requests. This includes requests to determine the status of scheduling queues and active allocations, the scope of \ac{API} and attribute support offered by the \ac{SMS}, namespaces of active jobs, location and information about a job's processes, and information regarding available resources.
1515

16-
An example use-case for the \refapi{PMIx_Query_info_nb} \ac{API} is to ensure clean job completion. Time-shared systems frequently impose maximum run times when assigning jobs to resource allocations. To shut down gracefully (e.g., to write a checkpoint before termination) it is necessary for an application to periodically query the resource manager for the time remaining in its allocation. This is especially true on systems for which allocation times may be shortened or lengthened from the original time limit. Many resource managers provide \acp{API} to dynamically obtain this information, but each \ac{API} is specific to the resource manager.
17-
\ac{PMIx} supports this use-case by defining an attribute key (\refattr{PMIX_TIME_REMAINING}) that can be used with the \refapi{PMIx_Query_info_nb} interface to obtain the number of seconds remaining in the current job allocation.
16+
An example use-case for the \refapi{PMIx_Query_info_nb} \ac{API} is to ensure clean job completion. Time-shared systems frequently impose maximum run times when assigning jobs to resource allocations. To shut down gracefully (e.g., to write a checkpoint before termination) it is necessary for an application to periodically query the resource manager for the time remaining in its allocation. This is especially true on systems for which allocation times may be shortened or lengthened from the original time limit. Many resource managers provide \acp{API} to dynamically obtain this information, but each \ac{API} is specific to the resource manager.
17+
\ac{PMIx} supports this use-case by defining an attribute key (\refattr{PMIX_TIME_REMAINING}) that can be used with the \refapi{PMIx_Query_info_nb} interface to obtain the number of seconds remaining in the current job allocation.
1818

19-
\ac{PMIx} sometimes provides multiple methods by which an application can obtain information or services. For this example,
20-
note that one could alternatively use the \refapi{PMIx_Register_event_handler} \ac{API} to register for an event indicating incipient job termination, and then use the \refapi{PMIx_Job_control_nb} \ac{API} to request that the host \ac{SMS} generate an event a specified amount of time prior to reaching the maximum run time.
19+
\ac{PMIx} sometimes provides multiple methods by which an application can obtain information or services. For this example,
20+
note that one could alternatively use the \refapi{PMIx_Register_event_handler} \ac{API} to register for an event indicating incipient job termination, and then use the \refapi{PMIx_Job_control_nb} \ac{API} to request that the host \ac{SMS} generate an event a specified amount of time prior to reaching the maximum run time.
2121

2222
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2323
\subsection{Query Structure}
@@ -105,7 +105,7 @@ \subsection{\code{PMIx_Query_info}}
105105
\pasteAttributeItem{PMIX_HOST_ATTRIBUTES}
106106
\pasteAttributeItem{PMIX_TOOL_ATTRIBUTES}
107107

108-
Note that inclusion of both the \refattr{PMIX_PROCID} directive and either the \refattr{PMIX_NSPACE} or the \refattr{PMIX_RANK} attribute will return a \refconst{PMIX_ERR_BAD_PARAM} result, and that the inclusion of a process identifier must apply to all keys in that \refstruct{pmix_query_t}. Queries for information on multiple specific processes therefore requires submitting multiple \refstruct{pmix_query_t} structures, each referencing one process. Directives which are not applicable to a key are ignored.
108+
Note that inclusion of both the \refattr{PMIX_PROCID} directive and either the \refattr{PMIX_NSPACE} or the \refattr{PMIX_RANK} attribute will return a \refconst{PMIX_ERR_BAD_PARAM} result, and that the inclusion of a process identifier must apply to all keys in that \refstruct{pmix_query_t}. Queries for information on multiple specific processes therefore requires submitting multiple \refstruct{pmix_query_t} structures, each referencing one process. Directives which are not applicable to a key are ignored.
109109

110110
% Use of pmix_server_query_fn is covered in server interfaces chapter
111111
\reqattrend
@@ -134,6 +134,8 @@ \subsection{\code{PMIx_Query_info}}
134134
\pasteAttributeItem{PMIX_QUERY_AUTHORIZATIONS}
135135
\pasteAttributeItem{PMIX_PROC_PID}
136136
\pasteAttributeItem{PMIX_PROC_STATE_STATUS}
137+
\pasteAttributeItem{PMIX_QUERY_PROC_RESOURCE_USAGE}
138+
\pasteAttributeItem{PMIX_QUERY_NODE_RESOURCE_USAGE}
137139

138140
%%%%
139141
\descr
@@ -228,7 +230,7 @@ \subsection{\code{PMIx_Query_info_nb}}
228230
\pasteAttributeItem{PMIX_HOST_ATTRIBUTES}
229231
\pasteAttributeItem{PMIX_TOOL_ATTRIBUTES}
230232

231-
Note that inclusion of both the \refattr{PMIX_PROCID} directive and either the \refattr{PMIX_NSPACE} or the \refattr{PMIX_RANK} attribute will return a \refconst{PMIX_ERR_BAD_PARAM} result, and that the inclusion of a process identifier must apply to all keys in that \refstruct{pmix_query_t}. Queries for information on multiple specific processes therefore requires submitting multiple \refstruct{pmix_query_t} structures, each referencing one process. Directives which are not applicable to a key are ignored.
233+
Note that inclusion of both the \refattr{PMIX_PROCID} directive and either the \refattr{PMIX_NSPACE} or the \refattr{PMIX_RANK} attribute will return a \refconst{PMIX_ERR_BAD_PARAM} result, and that the inclusion of a process identifier must apply to all keys in that \refstruct{pmix_query_t}. Queries for information on multiple specific processes therefore requires submitting multiple \refstruct{pmix_query_t} structures, each referencing one process. Directives which are not applicable to a key are ignored.
232234

233235
% Use of pmix_server_query_fn is covered in server interfaces chapter
234236
\reqattrend
@@ -257,6 +259,8 @@ \subsection{\code{PMIx_Query_info_nb}}
257259
\pasteAttributeItem{PMIX_QUERY_AUTHORIZATIONS}
258260
\pasteAttributeItem{PMIX_PROC_PID}
259261
\pasteAttributeItem{PMIX_PROC_STATE_STATUS}
262+
\pasteAttributeItem{PMIX_QUERY_PROC_RESOURCE_USAGE}
263+
\pasteAttributeItem{PMIX_QUERY_NODE_RESOURCE_USAGE}
260264

261265

262266
%%%%
@@ -267,7 +271,7 @@ \subsection{\code{PMIx_Query_info_nb}}
267271

268272
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
269273
%% NOTE: This is not used anywhere. If this is supposed to be returned by
270-
%% the query API's, it was never indicated. They currently return
274+
%% the query API's, it was never indicated. They currently return
271275
%% PMIX_ERR_PARTIAL_SUCCESS
272276
%%\subsection{Query-specific constants}
273277
%%\label{api:struct:constants:query}
@@ -350,6 +354,24 @@ \subsection{Query keys}
350354
%
351355
\pasteAttributeItem{PMIX_QUERY_PSET_MEMBERSHIP}
352356
%
357+
\declareAttributeProvisional{PMIX_QUERY_PROC_RESOURCE_USAGE}{"pmix.qry.pres"}{pmix_proc_t}{
358+
Return the resource usage statistics for the specified process in a \refstruct{pmix_proc_stats_t} structure.
359+
If a namespace is combined with \refconst{PMIX_RANK_WILDCARD}, then results for all processes in the given
360+
job shall be returned in a \refstruct{pmix_data_array_t} of \refstruct{pmix_proc_stats_t} structures.
361+
OPTIONAL QUALIFIERS: \refattr{PMIX_SESSION_ID} to identify the session of the namespace whose statistics
362+
are being requested; \refattr{PMIX_NODEID} or \refattr{PMIX_HOSTNAME} to restrict a wildcard request to
363+
processes on a given node.
364+
}
365+
%
366+
\declareAttributeProvisional{PMIX_QUERY_NODE_RESOURCE_USAGE}{"pmix.qry.nres"}{char*}{
367+
Return the resource usage statistics for the specified node in a \refstruct{pmix_node_stats_t} structure.
368+
If no node is specified, then results for all nodes hosting processes with the job of the requestor will be
369+
returned in a \refstruct{pmix_data_array_t} of \refstruct{pmix_node_stats_t} structures.
370+
OPTIONAL QUALIFIERS: \refattr{PMIX_SESSION_ID} to identify the session of the job whose node statistics
371+
are being requested; \refattr{PMIX_NSPACE} or \refattr{PMIX_JOBID} to identify the job whose node usage
372+
is being requested (if other than the job of the requestor).
373+
}
374+
%
353375
\declareAttribute{PMIX_QUERY_AVAIL_SERVERS}{"pmix.qry.asrvrs"}{pmix_data_array_t*}{
354376
Return an array of \refstruct{pmix_info_t}, each element itself containing a \refattr{PMIX_SERVER_INFO_ARRAY} entry holding all available data for a server on this node to which the caller might be able to connect.
355377
}
@@ -384,7 +406,7 @@ \subsection{Query keys}
384406
\subsection{Query attributes}
385407
\label{api:struct:attributes:query}
386408

387-
Attributes used to direct behavior of the
409+
Attributes used to direct behavior of the
388410
\refapi{PMIx_Query_info} and \refapi{PMIx_Query_info_nb} \acp{API}:
389411

390412
\declareAttribute{PMIX_QUERY_RESULTS}{"pmix.qry.res"}{pmix_data_array_t}{

0 commit comments

Comments
 (0)