Skip to content

Conversation

gspetro-NOAA
Copy link
Collaborator

@gspetro-NOAA gspetro-NOAA commented Oct 8, 2025

Commit Queue Requirements:

  • This PR addresses a relevant WM issue (if not, create an issue).
  • All subcomponent pull requests (if any) have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines), preferably on Ursa (Derecho or Hercules are acceptable alternatives). Exceptions: documentation-only PRs, CI-only PRs, etc.
    • Commit log file w/full results from RT suite run (if applicable).
    • Verify that test_changes.list indicates which tests, if any, are changed by this PR. Commit test_changes.list, even if it is empty.
  • Fill out all sections of this template.

Description:

This PR is currently being used to test a GitHub Actions workflow that will hopefully resolve Issue #2527. Currently, the scorecard can be viewed by clicking on "Regression Resource Check / write-results (pull_request)" at the bottom of the PR once it has passed. Then, on the left-hand side of the page that opens, click Summary. Scroll down, and click on "Runtime Results Summary" and/or "Memory Results Summary." See here, for example.

The scorecard currently:

  • Extracts runtime/memory for the logs at the HEAD of the current PR.
  • Extracts the last 10 commits from the develop branch to calculate the mean and standard deviation for runtime and memory per test
  • Compares the runtime/memory for the log at the HEAD of the current PR with the runtime/memory for the last two commits to develop.
    • For a specific test on a given machine:
      • ✅ indicates normal runtime/memory
      • ⚠️ indicates that the runtime/memory value is greater than two standard deviations above the mean.
      • ❌ indicates that for the past 2 PRs, runtime/memory has been greater than two standard deviations above the mean.

In progress:

  • Caching for historical data --> The get_data task takes a few minutes to run when extracting 30+ commits (as opposed to the 10 it is set at currently), but more commits will result in less variance in mean/std values. The solution is for the workflow to extract historical data once, cache it, and reference the cache in the future to avoid rerunning steps.
  • Reporting only tests that have warnings/failures in the row? This is especially important for Memory, where most tests seem to be in a normal range most of the time.
  • Testing to ensure that values are as expected
  • Refactoring --> to introduce better logging message, reduce code duplication, increase clarity, improve documentation, etc.

Commit Message:

* UFSWM - Create scorecard for runtime/memory metrics by machine

Priority:

  • Critical Bugfix: Reason
  • High: Reason
  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

  • None

UFSWM Blocking Dependencies:

  • Blocked by #
  • None

Documentation:

  • Documentation update required.
    • Relevant updates are included with this PR.
    • A WM issue has been opened to track the need for a documentation update; a person responsible for submitting the update has been assigned to the issue (link issue).
  • Documentation update NOT required.
    • Explanation: This is CI/CD testing targeted toward CMs, not users.

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.
  • PR Updates/Changes Baselines.
  • No Baseline Changes.

Input data Changes:

  • None.
  • New input data.
  • Updated input data.

Library Changes/Upgrades:

  • Required
    • Library names w/versions:
    • Git Stack Issue (JCSDA/spack-stack#)
  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • GaeaC6
    • Derecho
    • Ursa
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

gspetro-NOAA and others added 30 commits September 18, 2025 07:32
…d build env to remove gnu from stack (was ufs-community#2842) (ufs-community#2867)

* UFSWM - update ufs_noaacloud.intel.lua module file
* UFSWM - replace icplocn2atm with use_oceanuv in scripts and tests
  * CMEPS - update CCPP metadata and type defs for use_oceanuv
  * FV3 - 
    * ccpp-physics - replace instances of icplocn2atm with use_oceanuv
    * atmos_cubed_sphere - replace instances of icplocn2atm with use_oceanuv
  * NOAHMP - replace icplocn2atm with use_oceanuv
@gspetro-NOAA
Copy link
Collaborator Author

gspetro-NOAA commented Oct 13, 2025

@DeniseWorthen I've updated this so that only warning/failing tests are reported. At the bottom, I have the number of tests passing on each platform, but that could easily be inverted to how many are warning/failing for runtime/memory on each platform. I could also do percentages or decimal value (0 to 1) if preferred. In theory, I could add two rows, one with warning and one with failing. Lots of options, so I'd like to hear what you think would be most useful. I can stick to your original idea if that's what you prefer but wanted to propose options! Current output here.

I also added a column that shows number of platforms on which a test is passing. Seems like a row of mostly red would also be cause for concern, as it suggests an issue with the specific test, rather than with a particular platform.

For Jong's plots, I believe they are only for Ursa, and it would be a lot of plots if we did one for each machine. Should we just use Ursa as a reference machine for the plots? Or do you think it would be useful to have plots for every test on every machine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

No Baseline Change No Baseline Change

Projects

Status: Draft

Development

Successfully merging this pull request may close these issues.

track time/memory use statistics reported in RT logs

2 participants