Skip to content

Commit 81610eb

Browse files
committed
Provide access to resource usage for processes and nodes
Operating systems typically maintain a running measure of resource utilization by active processes. This includes metrics on CPU utilization, disk accesses, memory size, and network activity. Define a set of attributes by which these these metrics can be requested and returned. Attributes are used as a means of providing for later extension to include a broader range of metrics. Signed-off-by: Ralph Castain <[email protected]>
1 parent 46c63a5 commit 81610eb

File tree

4 files changed

+363
-15
lines changed

4 files changed

+363
-15
lines changed

Chap_API_Job_Mgmt.tex

Lines changed: 322 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -605,7 +605,6 @@ \subsection{Job control attributes}
605605
When recursively cleaning subdirectories, do not remove the top-level directory (the one given in the cleanup request).
606606
}
607607

608-
609608
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
610609
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
611610
\section{Process and Job Monitoring}
@@ -668,6 +667,7 @@ \subsection{\code{PMIx_Process_monitor}}
668667
\pasteAttributeItem{PMIX_MONITOR_FILE_CHECK_TIME}
669668
\pasteAttributeItem{PMIX_MONITOR_FILE_DROPS}
670669
\pasteAttributeItem{PMIX_SEND_HEARTBEAT}
670+
\pasteAttributeItem{PMIX_MONITOR_RESOURCE_USAGE}
671671

672672
\optattrend
673673

@@ -746,7 +746,7 @@ \subsection{\code{PMIx_Process_monitor_nb}}
746746
\pasteAttributeItem{PMIX_MONITOR_FILE_CHECK_TIME}
747747
\pasteAttributeItem{PMIX_MONITOR_FILE_DROPS}
748748
\pasteAttributeItem{PMIX_SEND_HEARTBEAT}
749-
749+
\pasteAttributeItem{PMIX_MONITOR_RESOURCE_USAGE}
750750
\optattrend
751751

752752
%%%%
@@ -850,6 +850,326 @@ \subsection{Monitoring attributes}
850850
\declareAttribute{PMIX_MONITOR_FILE_DROPS}{"pmix.monitor.fdrop"}{uint32_t}{
851851
Number of file checks that can be missed before generating the event.
852852
}
853+
%
854+
\declareAttributeProvisional{PMIX_MONITOR_RESOURCE_RATE}{pmix.monitor.resrate}{uint64_t}{
855+
Monitor resource usage every N seconds, where N is the value provided by the attribute.
856+
}
857+
%
858+
\declareAttributeProvisional{PMIX_MONITOR_RESOURCE_USAGE}{"pmix.monitor.resuse"}{pmix_data_array_t}{
859+
Monitor the resources specified in the provided \refstruct{pmix_data_array_t}. Resource types may
860+
include any of the following:
861+
862+
\begin{itemize}
863+
\item \refattr{PMIX_MONITOR_RESOURCE_RATE}. If not provided, then the request will be treated as a one-shot
864+
sampling of resource usage.
865+
\item \refattr{PMIX_PROC_RESOURCE_USAGE}. If the \refstruct{pmix_data_array_t} is empty, then
866+
all process resource usage values shall be returned for all processes in the session.
867+
Optionally, the array of \refstruct{pmix_info_t} can specify the processes to be monitored, and/or the particular attributes to be included. Note that the values in the provided structures will be
868+
ignored (i.e., only the attribute keys are relevant) except where noted, and that the
869+
\refattr{PMIX_PROC_SAMPLE_TIME} will always be included in the returned data (there is no
870+
need to include it in the request). Optional attributes include:
871+
\begin{itemize}
872+
\item \refattr{PMIX_PROCID}. Optionally specify the process to be monitored. Can include a
873+
\refconst{PMIX_RANK_WILDCARD} to indicate that all processes
874+
from a given namespace are to be included. If omitted, then
875+
all processes in the session will be monitored. May be included
876+
multiple times to fully specify all processes to be included.
877+
\item \refattr{PMIX_HOSTNAME}. Include the hostname where the process is located.
878+
\item \refattr{PMIX_PROC_PID}
879+
\item \refattr{PMIX_PROC_OS_STATE}
880+
\item \refattr{PMIX_PROC_TIME}
881+
\item \refattr{PMIX_PROC_PERCENT_CPU}
882+
\item \refattr{PMIX_PROC_PRIORITY}
883+
\item \refattr{PMIX_PROC_NUM_THREADS}
884+
\item \refattr{PMIX_PROC_PSS}
885+
\item \refattr{PMIX_PROC_VSIZE}
886+
\item \refattr{PMIX_PROC_RSS}
887+
\item \refattr{PMIX_PROC_PEAK_VSIZE}
888+
\item \refattr{PMIX_PROC_CPU}
889+
\item \refattr{PMIX_PROC_SAMPLE_TIME}
890+
\end{itemize}
891+
\item \refattr{PMIX_NODE_RESOURCE_USAGE}. If the \refstruct{pmix_data_array_t} is empty, then
892+
all node resource usage values shall be returned for all nodes in the session.
893+
Optionally, the array of \refstruct{pmix_info_t} can specify the nodes to be monitored (using the \refattr{PMIX_HOSTNAME} or \refattr{PMIX_NODEID} attributes), and/or the particular attributes to be included. Note that the values in the provided structures will be
894+
ignored (i.e., only the attribute keys are relevant) except where noted, and that the
895+
\refattr{PMIX_NODE_SAMPLE_TIME} will always be included in the returned data (there is no
896+
need to include it in the request). Optional
897+
attributes include:
898+
\begin{itemize}
899+
\item \refattr{PMIX_HOSTNAME}. Optionally specify the node to be monitored. May be included multiple
900+
times to fully specify all nodes to be included. Only
901+
hostname or node ID need be included (not both). If omitted, then all nodes in the session
902+
shall be monitored.
903+
\item \refattr{PMIX_NODEID}. Optionally specify the process to be monitored. May be included multiple
904+
times to fully specify all nodes to be included. Only
905+
hostname or node ID need be included (not both). If omitted, then all nodes in the session
906+
shall be monitored.
907+
\item \refattr{PMIX_NODE_LOAD_AVG}
908+
\item \refattr{PMIX_NODE_LOAD_AVG5}
909+
\item \refattr{PMIX_NODE_LOAD_AVG15}
910+
\item \refattr{PMIX_NODE_MEM_TOTAL}
911+
\item \refattr{PMIX_NODE_MEM_FREE}
912+
\item \refattr{PMIX_NODE_MEM_BUFFERS}
913+
\item \refattr{PMIX_NODE_MEM_CACHED}
914+
\item \refattr{PMIX_NODE_MEM_SWAP_CACHED}
915+
\item \refattr{PMIX_NODE_MEM_SWAP_TOTAL}
916+
\item \refattr{PMIX_NODE_MEM_SWAP_FREE}
917+
\item \refattr{PMIX_NODE_MEM_MAPPED}
918+
\item \refattr{PMIX_DISK_RESOURCE_USAGE}. If the \refstruct{pmix_data_array_t} is empty, then
919+
all disk resource usage values shall be returned for all disks attached to the node.
920+
Optionally, the array of \refstruct{pmix_info_t} can specify the disks to be monitored (using the \refattr{PMIX_DISK_ID} attribute), and/or the particular attributes to be included. Note that the values in the provided structures will be
921+
ignored (i.e., only the attribute keys are relevant) except where noted, and that the
922+
\refattr{PMIX_DISK_SAMPLE_TIME} will always be included in the returned data (there is no
923+
need to include it in the request). Optional
924+
attributes include:
925+
\begin{itemize}
926+
\item \refattr{PMIX_DISK_ID}. Optionally specify the disk to be monitored. If omitted, then all disks
927+
attached to the node will be monitored.
928+
\item \refattr{PMIX_DISK_READ_COMPLETED}
929+
\item \refattr{PMIX_DISK_READ_MERGED}
930+
\item \refattr{PMIX_DISK_READ_SECTORS}
931+
\item \refattr{PMIX_DISK_READ_MILLISEC}
932+
\item \refattr{PMIX_DISK_WRITE_COMPLETED}
933+
\item \refattr{PMIX_DISK_WRITE_MERGED}
934+
\item \refattr{PMIX_DISK_WRITE_SECTORS}
935+
\item \refattr{PMIX_DISK_WRITE_MILLISEC}
936+
\item \refattr{PMIX_DISK_IO_IN_PROGRESS}
937+
\item \refattr{PMIX_DISK_IO_MILLISEC}
938+
\item \refattr{PMIX_DISK_IO_WEIGHTED}
939+
\end{itemize}
940+
\item \refattr{PMIX_NETWORK_RESOURCE_USAGE}. If the \refstruct{pmix_data_array_t} is empty, then
941+
all network resource usage values shall be returned for all interfaces on the node.
942+
Optionally, the array of \refstruct{pmix_info_t} can specify the networks to be monitored (using the \refattr{PMIX_NETWORK_ID} attribute), and/or the particular attributes to be included. Note that the values in the provided structures will be
943+
ignored (i.e., only the attribute keys are relevant) except where noted, and that the
944+
\refattr{PMIX_NET_SAMPLE_TIME} will always be included in the returned data (there is no
945+
need to include it in the request). Optional
946+
attributes include:
947+
\begin{itemize}
948+
\item \refattr{PMIX_NETWORK_ID}. Optionally specify the interface to be monitored. If omitted, then all
949+
interfaces on the node will be monitored.
950+
\item \refattr{PMIX_NET_RECVD_BYTES}
951+
\item \refattr{PMIX_NET_RECVD_PCKTS}
952+
\item \refattr{PMIX_NET_RECVD_ERRS}
953+
\item \refattr{PMIX_NET_SENT_BYTES}
954+
\item \refattr{PMIX_NET_SENT_PCKTS}
955+
\item \refattr{PMIX_NET_SENT_ERRS}
956+
\end{itemize}
957+
\end{itemize}
958+
\end{itemize}
959+
}
960+
961+
%%%%%%%%%%%
962+
\versionMarkerProvisional{6.0}
963+
\subsection{Resource usage attributes}
964+
\label{api:struct:attributes:resusage}
965+
966+
Operating systems typically maintain a running measure of resource utilization by active processes,
967+
attached disks, and local network interfaces.
968+
Though the precise values being tracked can vary by \ac{OS} flavor and local configuration, the following
969+
attributes are defined to provide a means for requesting and returning the available metrics.
970+
971+
\subsubsection{Process resource usage}
972+
973+
\declareAttributeProvisional{PMIX_PROC_RESOURCE_USAGE}{"pmix.proc.res"}{pmix_data_array_t}{
974+
An array of \refstruct{pmix_info_t} describing the resource usage of the specified process, with
975+
the first element containing the ID of the process (marked by the \refattr{PMIX_PROCID} key)
976+
whose usage is reported in the array. The list of included information may vary across
977+
implementations and \acp{OS}, depending upon availability and access restrictions. Except for
978+
the process ID as the first element, ordering of information in the array is arbitrary.
979+
}
980+
981+
Optional information that may be included (see \href{https://www.kernel.org/doc/html/latest/filesystems/proc.html}{PROCSTATS} for a detailed description of the following fields):
982+
\begin{itemize}
983+
\item \refattr{PMIX_HOSTNAME}. Either the hostname or \refattr{PMIX_NODEID} may be provided.
984+
\item \refattr{PMIX_PROC_PID}
985+
\item \refattr{PMIX_CMD_LINE}. Typically limited solely to the argv[0] for the process
986+
\item \declareAttributeProvisional{PMIX_PROC_OS_STATE}{"pmix.proc.osstate"}{char*}{
987+
The state of the process as reported by the \ac{OS} - for Linux, this is expressed as a single character.
988+
}
989+
\item \declareAttributeProvisional{PMIX_PROC_TIME}{"pmix.proc.time"}{struct timeval}{
990+
Cumulative CPU time
991+
}
992+
\item \declareAttributeProvisional{PMIX_PROC_PERCENT_CPU}{"pmix.proc.pcpu"}{float}{
993+
Percent cpu utilization by the process. Often, it is the CPU time used divided by the time the process has
994+
been running (cputime/realtime ratio), expressed as a percentage.
995+
}
996+
\item \declareAttributeProvisional{PMIX_PROC_PRIORITY}{"pmix.proc.pri"}{int32_t}{
997+
Priority of the process. Higher number means higher priority.
998+
}
999+
\item \declareAttributeProvisional{PMIX_PROC_NUM_THREADS}{"pmix.proc.nthr"}{uint16_t}{
1000+
Number of threads operating in the process
1001+
}
1002+
\item \declareAttributeProvisional{PMIX_PROC_PSS}{"pmix.proc.pss"}{float}{
1003+
Proportional share size, the non-swapped physical memory, with shared memory
1004+
proportionally accounted to all tasks mapping it (MBytes)
1005+
}
1006+
\item \declareAttributeProvisional{PMIX_PROC_VSIZE}{"pmix.proc.vsize"}{float}{
1007+
Virtual memory size of the process (MBytes)
1008+
}
1009+
\item \declareAttributeProvisional{PMIX_PROC_RSS}{"pmix.proc.rss"}{float}{
1010+
Resident set size, the non-swapped physical memory that a task has used (MBytes)
1011+
}
1012+
\item \declareAttributeProvisional{PMIX_PROC_PEAK_VSIZE}{"pmix.proc.pkvsize"}{float}{
1013+
Peak virtual memory size of the process (MBytes)
1014+
}
1015+
\item \declareAttributeProvisional{PMIX_PROC_CPU}{"pmix.proc.cpu"}{uint16_t}{
1016+
Processor that process last executed on
1017+
}
1018+
\item \declareAttributeProvisional{PMIX_PROC_SAMPLE_TIME}{"pmix.proc.samptime"}{struct timeval}{
1019+
Time when sample was taken
1020+
}
1021+
\end{itemize}
1022+
1023+
1024+
\subsubsection{Disk resource usage}
1025+
1026+
\declareAttributeProvisional{PMIX_DISK_ID}{"pmix.disk.id"}{char*}{
1027+
String identifier of a disk
1028+
}
1029+
1030+
\declareAttributeProvisional{PMIX_DISK_RESOURCE_USAGE}{"pmix.disk.res"}{pmix_data_array_t}{
1031+
An array of \refstruct{pmix_info_t} describing the resource usage of the specified disk, with
1032+
the first element containing the string name of the disk (marked by the \refattr{PMIX_DISK_ID} key)
1033+
whose usage is reported in the array. The list of included information may vary across
1034+
implementations and \acp{OS}, depending upon availability and access restrictions. Except for
1035+
the disk ID as the first element, ordering of information in the array is arbitrary.
1036+
}
1037+
1038+
Optional information that may be included (see \href{https://www.kernel.org/doc/html/latest/admin-guide/iostats.html)}{IOSTATS} for a detailed description of the following fields):
1039+
\begin{itemize}
1040+
\item \declareAttributeProvisional{PMIX_DISK_READ_COMPLETED}{"pmix.disk.rdscomp"}{uint64_t}{
1041+
Number of completed read operations
1042+
}
1043+
\item \declareAttributeProvisional{PMIX_DISK_READ_MERGED}{"pmix.disk.rdsmrgd"}{uint64_t}{
1044+
Number of merged reads
1045+
}
1046+
\item \declareAttributeProvisional{PMIX_DISK_READ_SECTORS}{"pmix.disk.rdsct"}{uint64_t}{
1047+
Number of sectors read
1048+
}
1049+
\item \declareAttributeProvisional{PMIX_DISK_READ_MILLISEC}{"pmix.disk.rdms"}{uint64_t}{
1050+
Number of milliseconds spent reading the disk
1051+
}
1052+
\item \declareAttributeProvisional{PMIX_DISK_WRITE_COMPLETED}{"pmix.disk.wtscomp"}{uint64_t}{
1053+
Number of completed write operations
1054+
}
1055+
\item \declareAttributeProvisional{PMIX_DISK_WRITE_MERGED}{"pmix.disk.wtsmrgd"}{uint64_t}{
1056+
Number of merged write operations
1057+
}
1058+
\item \declareAttributeProvisional{PMIX_DISK_WRITE_SECTORS}{"pmix.disk.wtsct"}{uint64_t}{
1059+
Number of sectors read
1060+
}
1061+
\item \declareAttributeProvisional{PMIX_DISK_WRITE_MILLISEC}{"pmix.disk.wtms"}{uint64_t}{
1062+
Number of milliseconds spent writing to the disk
1063+
}
1064+
\item \declareAttributeProvisional{PMIX_DISK_IO_IN_PROGRESS}{"pmix.disk.ios"}{uint64_t}{
1065+
Number of disk IO operations in progress
1066+
}
1067+
\item \declareAttributeProvisional{PMIX_DISK_IO_MILLISEC}{"pmix.disk.ioms"}{uint64_t}{
1068+
Number of milliseconds spent in IO operations
1069+
}
1070+
\item \declareAttributeProvisional{PMIX_DISK_IO_WEIGHTED}{"pmix.disk.iowght"}{uint64_t}{
1071+
Number of IOs in progress times the number of milliseconds spent doing IO since
1072+
last update of the field - indicator of backlog that may be accumulating
1073+
}
1074+
\item \declareAttributeProvisional{PMIX_DISK_SAMPLE_TIME}{"pmix.disk.samptime"}{struct timeval}{
1075+
Time when sample was taken
1076+
}
1077+
\end{itemize}
1078+
1079+
\subsubsection{Network resource usage}
1080+
1081+
\declareAttributeProvisional{PMIX_NETWORK_ID}{"pmix.net.id"}{char*}{
1082+
String identifier of a network interface
1083+
}
1084+
1085+
\declareAttributeProvisional{PMIX_NETWORK_RESOURCE_USAGE}{"pmix.net.res"}{pmix_data_array_t}{
1086+
An array of \refstruct{pmix_info_t} describing the resource usage of the specified network, with
1087+
the first element containing the string name of the interface (marked by the \refattr{PMIX_NETWORK_ID} key)
1088+
whose usage is reported in the array. The list of included information may vary across
1089+
implementations and \acp{OS}, depending upon availability and access restrictions. Except for
1090+
the network ID as the first element, ordering of information in the array is arbitrary.
1091+
}
1092+
1093+
Optional information that may be included (see \href{https://www.kernel.org/doc/html/latest/networking/statistics.html}{NETSTATS} for a detailed description of the following fields):
1094+
\begin{itemize}
1095+
\item \declareAttributeProvisional{PMIX_NET_RECVD_BYTES}{"pmix.net.rcb"}{uint64_t}{
1096+
Number of bytes received
1097+
}
1098+
\item \declareAttributeProvisional{PMIX_NET_RECVD_PCKTS}{"pmix.net.rcp"}{uint64_t}{
1099+
Number of packets received
1100+
}
1101+
\item \declareAttributeProvisional{PMIX_NET_RECVD_ERRS}{"pmix.net.rcerr"}{uint64_t}{
1102+
Number of receive errors
1103+
}
1104+
\item \declareAttributeProvisional{PMIX_NET_SENT_BYTES}{"pmix.net.sntb"}{uint64_t}{
1105+
Number of bytes sent
1106+
}
1107+
\item \declareAttributeProvisional{PMIX_NET_SENT_PCKTS}{"pmix.net.sntp"}{uint64_t}{
1108+
Number of packets sent
1109+
}
1110+
\item \declareAttributeProvisional{PMIX_NET_SENT_ERRS}{"pmix.net.snterr"}{uint64_t}{
1111+
Number of send errors
1112+
}
1113+
\item \declareAttributeProvisional{PMIX_NET_SAMPLE_TIME}{"pmix.net.samptime"}{struct timeval}{
1114+
Time when sample was taken
1115+
}
1116+
\end{itemize}
1117+
1118+
1119+
\subsubsection{Node resource usage}
1120+
1121+
\declareAttributeProvisional{PMIX_NODE_RESOURCE_USAGE}{"pmix.node.res"}{pmix_data_array_t}{
1122+
An array of \refstruct{pmix_info_t} describing the overall resource usage on the specified node,
1123+
with the first element containing
1124+
the ID of the node (marked by the \refattr{PMIX_HOSTNAME} or \refattr{PMIX_NODEID} key) whose usage
1125+
is reported in the array. The list of included information may vary across
1126+
implementations and \acp{OS}, depending upon availability and access restrictions. Except for
1127+
the node ID as the first element, ordering of information in the array is arbitrary.
1128+
}
1129+
1130+
Optional information that may be included (see \href{https://www.kernel.org/doc/html/latest/filesystems/proc.html#kernel-data}{KERNEL} and \href{https://www.kernel.org/doc/html/latest/filesystems/proc.html#meminfo}{MEMINFO} for a detailed description of the following fields):
1131+
\begin{itemize}
1132+
\item \declareAttributeProvisional{PMIX_NODE_LOAD_AVG}{"pmix.node.la"}{float}{
1133+
Load average of last minute
1134+
}
1135+
\item \declareAttributeProvisional{PMIX_NODE_LOAD_AVG5}{"pmix.node.la5"}{float}{
1136+
Load average of last five minutes
1137+
}
1138+
\item \declareAttributeProvisional{PMIX_NODE_LOAD_AVG15}{"pmix.node.la15"}{float}{
1139+
Load average of last fifteen minutes
1140+
}
1141+
\item \declareAttributeProvisional{PMIX_NODE_MEM_TOTAL}{"pmix.node.mtot"}{float}{
1142+
Total usable RAM (i.e., physical RAM minus reserved bits and kernel binary code). In MBytes
1143+
}
1144+
\item \declareAttributeProvisional{PMIX_NODE_MEM_FREE}{"pmix.node.mfree"}{float}{
1145+
Total free RAM. In MBytes
1146+
}
1147+
\item \declareAttributeProvisional{PMIX_NODE_MEM_BUFFERS}{"pmix.node.mbuf"}{float}{
1148+
Temporary storage for raw disk blocks. In MBytes
1149+
}
1150+
\item \declareAttributeProvisional{PMIX_NODE_MEM_CACHED}{"pmix.node.mcache"}{float}{
1151+
In-memory cache for files read from the disk (the pagecache) as well as tmpfs and shmem. In MBytes.
1152+
Doesn't include \refattr{PMIX_NODE_MEM_SWAP_CACHED}.
1153+
}
1154+
\item \declareAttributeProvisional{PMIX_NODE_MEM_SWAP_CACHED}{"pmix.node.mswpc"}{float}{
1155+
Memory that once was swapped out, is swapped back in but still also is in the swapfile. In MBytes
1156+
}
1157+
\item \declareAttributeProvisional{PMIX_NODE_MEM_SWAP_TOTAL}{"pmix.node.mswpt"}{float}{
1158+
Total amount of swap space available. In MBytes
1159+
}
1160+
\item \declareAttributeProvisional{PMIX_NODE_MEM_SWAP_FREE}{"pmix.node.mswpfree"}{float}{
1161+
Memory which has been evicted from RAM, and is temporarily on the disk. In MBytes
1162+
}
1163+
\item \declareAttributeProvisional{PMIX_NODE_MEM_MAPPED}{"pmix.node.mmap"}{float}{
1164+
files which have been mmapped, such as libraries. Note that some kernel configurations might consider all pages part of a larger allocation (e.g., THP) as “mapped”, as soon as a single page is mapped. In MBytes
1165+
}
1166+
\item \refattr{PMIX_DISK_RESOURCE_USAGE} One for each disk attached to the node.
1167+
\item \refattr{PMIX_NETWORK_RESOURCE_USAGE} One for each network interface on the node.
1168+
\item \declareAttributeProvisional{PMIX_NODE_SAMPLE_TIME}{"pmix.node.samptime"}{struct timeval}{
1169+
Time when sample was taken
1170+
}
1171+
\end{itemize}
1172+
8531173

8541174
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
8551175
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

0 commit comments

Comments
 (0)