Skip to content

Conversation

rhc54
Copy link
Member

@rhc54 rhc54 commented Jun 7, 2025

Operating systems typically maintain a running measure of resource
utilization by active processes. This includes metrics on CPU
utilization, disk accesses, memory size, and network activity.
Define a set of attributes by which these these metrics can be
requested and returned.

Attributes are used as a means of providing for later extension
to include a broader range of metrics.

Replaces #335

@rhc54
Copy link
Member Author

rhc54 commented Jun 7, 2025

Please use emoji reactions ON THIS COMMENT to indicate your position on this proposal.

  • You do not need to vote on every proposal
  • If you have no opinion, don't vote - that is also useful data
  • If you've already commented on this issue, please still vote so
    we know your current thoughts
  • Not all proposals solve exactly the same problem, so we may end
    up accepting proposals that appear to have some overlap
    This is not a binding majority-rule vote, but it will be a very
    significant input into the corresponding ASC decision.

Here are the meanings for the emojis:

  • Hooray or Rocket: I support this so strongly that I
    want to be an advocate for it
  • Heart: I think this is an ideal solution
  • Thumbs up: I'd be happy with this solution
  • Confused: I'd rather we not do this, but I can tolerate it
  • Thumbs down: I'd be actively unhappy, and may even consider
    other technologies instead
    If you want to explain in more detail, feel free to add another
    comment, but please also vote on this comment.

@rhc54 rhc54 self-assigned this Jun 7, 2025
@rhc54
Copy link
Member Author

rhc54 commented Jun 7, 2025

This PR replaces the referenced one, which was woefully stale. The doc has been reorganized and heavily modified since the original proposed change. This has been updated and organized to fit within the current doc, and to address the questions that remained on the prior PR.

@rhc54
Copy link
Member Author

rhc54 commented Jun 7, 2025

@HawkmoonEternal Does this look okay to you?

@HawkmoonEternal
Copy link

This looks great!

I could imagine that extensions to the list of sampled stats (e.g., for power and energy measurements) might be necessary in the future. Adding additional struct members later should be straightforward.

@rhc54
Copy link
Member Author

rhc54 commented Jun 8, 2025

Hmmm...extending the current structs would actually require renaming them to avoid conflicts with prior implementations. I can see two alternatives:

  • eliminate the struct definitions and replace them with attributes. So we would have a PMIX_PROC_STATS attribute that is associated with a pmix_pointer_array_t of pmix_info_t values containing things like PMIX_HOSTNAME for the node the proc is on, PMIX_PROC_PSS for the pss value, etc. This would maximize flexibility as we could add whatever we want down the road.
  • retain the current struct definitions and extend them with a pmix_info_t array. So the structs would contain the standard Linux OS entries, but would have an array we could use to add anything else.

The second is less cumbersome if all you want is the OS values, but feels somewhat odd as it implies OS values should be treated differently.

The first is aesthetically nicer, but means defining a bunch of attributes - not a big deal, just looks like a bigger change - and the struct winds up using more memory due to all those string keys.

Anyone have any thoughts?

@rhc54
Copy link
Member Author

rhc54 commented Jun 16, 2025

@HawkmoonEternal I opted to go with the attribute-based approach to accommodate later extensions without having to rename/deprecate structures and their associated utility functions. Seemed like the more forward-looking approach. Please see what you think.

@naughtont3
Copy link
Contributor

Note from 25Q3 meeting, will allow more time for reviews/comments and bring for vote at next quarterly (25Q4).

\item \declareAttributeProvisional{PMIX_PROC_PEAK_VSIZE}{"pmix.proc.pkvsize"}{float}{
Peak virtual memory size of the process (MBytes)
}
\item \declareAttributeProvisional{PMIX_PROC_CPU}{"pmix.proc.cpu"}{uint16_t}{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably beyond the scope of this PR, but would be interesting to collect GPU devices usage, memory etc.

@naughtont3 naughtont3 added the Accepted as Provisional ASC vote passed. Accepted as Provisional! label Oct 16, 2025
@naughtont3
Copy link
Contributor

2025-Q4 Vote passed: 6 yes, 0 No, 1 Abstain

abouteiller
abouteiller previously approved these changes Oct 17, 2025
Operating systems typically maintain a running measure of resource
utilization by active processes. This includes metrics on CPU
utilization, disk accesses, memory size, and network activity.
Define a set of attributes by which these these metrics can be
requested and returned.

Attributes are used as a means of providing for later extension
to include a broader range of metrics.

Signed-off-by: Ralph Castain <[email protected]>
@rhc54
Copy link
Member Author

rhc54 commented Oct 17, 2025

@abouteiller @naughtont3 Could someone please re-check the reviewed box? This one had some required repairs after the other PRs were merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Accepted as Provisional ASC vote passed. Accepted as Provisional!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants